Overview

Dataset statistics

Number of variables9
Number of observations105840
Missing cells3024
Missing cells (%)0.3%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory7.3 MiB
Average record size in memory72.0 B

Variable types

Numeric2
Categorical7

Alerts

DGUID is highly overall correlated with GEOHigh correlation
GEO is highly overall correlated with DGUIDHigh correlation
Indicators is highly overall correlated with UOM and 1 other fieldsHigh correlation
UOM is highly overall correlated with IndicatorsHigh correlation
SCALAR_FACTOR is highly overall correlated with IndicatorsHigh correlation
VALUE has 3024 (2.9%) missing valuesMissing
DGUID is uniformly distributedUniform
GEO is uniformly distributedUniform
Sector is uniformly distributedUniform
Characteristics is uniformly distributedUniform
Indicators is uniformly distributedUniform

Reproduction

Analysis started2023-11-27 20:36:56.149883
Analysis finished2023-11-27 20:37:01.546505
Duration5.4 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

REF_DATE
Real number (ℝ)

Distinct12
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2015.5
Minimum2010
Maximum2021
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size827.0 KiB
2023-11-27T15:37:01.671297image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum2010
5-th percentile2010
Q12012.75
median2015.5
Q32018.25
95-th percentile2021
Maximum2021
Range11
Interquartile range (IQR)5.5

Descriptive statistics

Standard deviation3.4520688
Coefficient of variation (CV)0.0017127605
Kurtosis-1.216784
Mean2015.5
Median Absolute Deviation (MAD)3
Skewness0
Sum2.1332052 × 108
Variance11.916779
MonotonicityIncreasing
2023-11-27T15:37:01.977442image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=12)
ValueCountFrequency (%)
2010 8820
8.3%
2011 8820
8.3%
2012 8820
8.3%
2013 8820
8.3%
2014 8820
8.3%
2015 8820
8.3%
2016 8820
8.3%
2017 8820
8.3%
2018 8820
8.3%
2019 8820
8.3%
Other values (2) 17640
16.7%
ValueCountFrequency (%)
2010 8820
8.3%
2011 8820
8.3%
2012 8820
8.3%
2013 8820
8.3%
2014 8820
8.3%
2015 8820
8.3%
2016 8820
8.3%
2017 8820
8.3%
2018 8820
8.3%
2019 8820
8.3%
ValueCountFrequency (%)
2021 8820
8.3%
2020 8820
8.3%
2019 8820
8.3%
2018 8820
8.3%
2017 8820
8.3%
2016 8820
8.3%
2015 8820
8.3%
2014 8820
8.3%
2013 8820
8.3%
2012 8820
8.3%

DGUID
Categorical

HIGH CORRELATION  UNIFORM 

Distinct14
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size827.0 KiB
2016A000011124
7560 
2016A000210
7560 
2016A000211
7560 
2016A000212
7560 
2016A000213
7560 
Other values (9)
68040 

Length

Max length14
Median length11
Mean length11.214286
Min length11

Characters and Unicode

Total characters1186920
Distinct characters11
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2016A000011124
2nd row2016A000011124
3rd row2016A000011124
4th row2016A000011124
5th row2016A000011124

Common Values

ValueCountFrequency (%)
2016A000011124 7560
 
7.1%
2016A000210 7560
 
7.1%
2016A000211 7560
 
7.1%
2016A000212 7560
 
7.1%
2016A000213 7560
 
7.1%
2016A000224 7560
 
7.1%
2016A000235 7560
 
7.1%
2016A000246 7560
 
7.1%
2016A000247 7560
 
7.1%
2016A000248 7560
 
7.1%
Other values (4) 30240
28.6%

Length

2023-11-27T15:37:02.260688image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
2016a000011124 7560
 
7.1%
2016a000210 7560
 
7.1%
2016a000211 7560
 
7.1%
2016a000212 7560
 
7.1%
2016a000213 7560
 
7.1%
2016a000224 7560
 
7.1%
2016a000235 7560
 
7.1%
2016a000246 7560
 
7.1%
2016a000247 7560
 
7.1%
2016a000248 7560
 
7.1%
Other values (4) 30240
28.6%

Most occurring characters

ValueCountFrequency (%)
0 446040
37.6%
2 234360
19.7%
1 173880
 
14.6%
6 136080
 
11.5%
A 105840
 
8.9%
4 37800
 
3.2%
3 15120
 
1.3%
5 15120
 
1.3%
7 7560
 
0.6%
8 7560
 
0.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 1081080
91.1%
Uppercase Letter 105840
 
8.9%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 446040
41.3%
2 234360
21.7%
1 173880
 
16.1%
6 136080
 
12.6%
4 37800
 
3.5%
3 15120
 
1.4%
5 15120
 
1.4%
7 7560
 
0.7%
8 7560
 
0.7%
9 7560
 
0.7%
Uppercase Letter
ValueCountFrequency (%)
A 105840
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 1081080
91.1%
Latin 105840
 
8.9%

Most frequent character per script

Common
ValueCountFrequency (%)
0 446040
41.3%
2 234360
21.7%
1 173880
 
16.1%
6 136080
 
12.6%
4 37800
 
3.5%
3 15120
 
1.4%
5 15120
 
1.4%
7 7560
 
0.7%
8 7560
 
0.7%
9 7560
 
0.7%
Latin
ValueCountFrequency (%)
A 105840
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1186920
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 446040
37.6%
2 234360
19.7%
1 173880
 
14.6%
6 136080
 
11.5%
A 105840
 
8.9%
4 37800
 
3.2%
3 15120
 
1.3%
5 15120
 
1.3%
7 7560
 
0.6%
8 7560
 
0.6%

GEO
Categorical

HIGH CORRELATION  UNIFORM 

Distinct14
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size827.0 KiB
Canada
7560 
Newfoundland and Labrador
7560 
Prince Edward Island
7560 
Nova Scotia
7560 
New Brunswick
7560 
Other values (9)
68040 

Length

Max length25
Median length14.5
Mean length11.714286
Min length5

Characters and Unicode

Total characters1239840
Distinct characters34
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowCanada
2nd rowCanada
3rd rowCanada
4th rowCanada
5th rowCanada

Common Values

ValueCountFrequency (%)
Canada 7560
 
7.1%
Newfoundland and Labrador 7560
 
7.1%
Prince Edward Island 7560
 
7.1%
Nova Scotia 7560
 
7.1%
New Brunswick 7560
 
7.1%
Quebec 7560
 
7.1%
Ontario 7560
 
7.1%
Manitoba 7560
 
7.1%
Saskatchewan 7560
 
7.1%
Alberta 7560
 
7.1%
Other values (4) 30240
28.6%

Length

2023-11-27T15:37:02.548490image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
canada 7560
 
4.5%
newfoundland 7560
 
4.5%
territories 7560
 
4.5%
northwest 7560
 
4.5%
yukon 7560
 
4.5%
columbia 7560
 
4.5%
british 7560
 
4.5%
alberta 7560
 
4.5%
saskatchewan 7560
 
4.5%
manitoba 7560
 
4.5%
Other values (12) 90720
54.5%

Most occurring characters

ValueCountFrequency (%)
a 151200
 
12.2%
n 90720
 
7.3%
r 90720
 
7.3%
t 75600
 
6.1%
e 75600
 
6.1%
o 75600
 
6.1%
i 75600
 
6.1%
d 60480
 
4.9%
60480
 
4.9%
u 52920
 
4.3%
Other values (24) 430920
34.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1020600
82.3%
Uppercase Letter 158760
 
12.8%
Space Separator 60480
 
4.9%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 151200
14.8%
n 90720
 
8.9%
r 90720
 
8.9%
t 75600
 
7.4%
e 75600
 
7.4%
o 75600
 
7.4%
i 75600
 
7.4%
d 60480
 
5.9%
u 52920
 
5.2%
s 45360
 
4.4%
Other values (9) 226800
22.2%
Uppercase Letter
ValueCountFrequency (%)
N 37800
23.8%
B 15120
 
9.5%
S 15120
 
9.5%
C 15120
 
9.5%
I 7560
 
4.8%
E 7560
 
4.8%
P 7560
 
4.8%
L 7560
 
4.8%
Q 7560
 
4.8%
O 7560
 
4.8%
Other values (4) 30240
19.0%
Space Separator
ValueCountFrequency (%)
60480
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 1179360
95.1%
Common 60480
 
4.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 151200
 
12.8%
n 90720
 
7.7%
r 90720
 
7.7%
t 75600
 
6.4%
e 75600
 
6.4%
o 75600
 
6.4%
i 75600
 
6.4%
d 60480
 
5.1%
u 52920
 
4.5%
s 45360
 
3.8%
Other values (23) 385560
32.7%
Common
ValueCountFrequency (%)
60480
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1239840
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a 151200
 
12.2%
n 90720
 
7.3%
r 90720
 
7.3%
t 75600
 
6.1%
e 75600
 
6.1%
o 75600
 
6.1%
i 75600
 
6.1%
d 60480
 
4.9%
60480
 
4.9%
u 52920
 
4.3%
Other values (24) 430920
34.8%

Sector
Categorical

UNIFORM 

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size827.0 KiB
Total non-profit institutions
21168 
Total non-profit institutions excluding governments
21168 
Non-profit institutions serving households (community organizations)
21168 
Business non-profit institutions
21168 
Government non-profit institutions
21168 

Length

Max length68
Median length34
Mean length42.8
Min length29

Characters and Unicode

Total characters4529952
Distinct characters29
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowTotal non-profit institutions
2nd rowTotal non-profit institutions
3rd rowTotal non-profit institutions
4th rowTotal non-profit institutions
5th rowTotal non-profit institutions

Common Values

ValueCountFrequency (%)
Total non-profit institutions 21168
20.0%
Total non-profit institutions excluding governments 21168
20.0%
Non-profit institutions serving households (community organizations) 21168
20.0%
Business non-profit institutions 21168
20.0%
Government non-profit institutions 21168
20.0%

Length

2023-11-27T15:37:02.886318image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-11-27T15:37:03.346316image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
non-profit 105840
25.0%
institutions 105840
25.0%
total 42336
 
10.0%
excluding 21168
 
5.0%
governments 21168
 
5.0%
serving 21168
 
5.0%
households 21168
 
5.0%
community 21168
 
5.0%
organizations 21168
 
5.0%
business 21168
 
5.0%

Most occurring characters

ValueCountFrequency (%)
n 613872
13.6%
i 550368
12.1%
t 550368
12.1%
o 508032
11.2%
s 381024
 
8.4%
317520
 
7.0%
r 190512
 
4.2%
u 190512
 
4.2%
e 169344
 
3.7%
f 105840
 
2.3%
Other values (19) 952560
21.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 3958416
87.4%
Space Separator 317520
 
7.0%
Dash Punctuation 105840
 
2.3%
Uppercase Letter 105840
 
2.3%
Open Punctuation 21168
 
0.5%
Close Punctuation 21168
 
0.5%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
n 613872
15.5%
i 550368
13.9%
t 550368
13.9%
o 508032
12.8%
s 381024
9.6%
r 190512
 
4.8%
u 190512
 
4.8%
e 169344
 
4.3%
f 105840
 
2.7%
p 105840
 
2.7%
Other values (11) 592704
15.0%
Uppercase Letter
ValueCountFrequency (%)
T 42336
40.0%
N 21168
20.0%
B 21168
20.0%
G 21168
20.0%
Space Separator
ValueCountFrequency (%)
317520
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 105840
100.0%
Open Punctuation
ValueCountFrequency (%)
( 21168
100.0%
Close Punctuation
ValueCountFrequency (%)
) 21168
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 4064256
89.7%
Common 465696
 
10.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
n 613872
15.1%
i 550368
13.5%
t 550368
13.5%
o 508032
12.5%
s 381024
9.4%
r 190512
 
4.7%
u 190512
 
4.7%
e 169344
 
4.2%
f 105840
 
2.6%
p 105840
 
2.6%
Other values (15) 698544
17.2%
Common
ValueCountFrequency (%)
317520
68.2%
- 105840
 
22.7%
( 21168
 
4.5%
) 21168
 
4.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 4529952
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
n 613872
13.6%
i 550368
12.1%
t 550368
12.1%
o 508032
11.2%
s 381024
 
8.4%
317520
 
7.0%
r 190512
 
4.2%
u 190512
 
4.2%
e 169344
 
3.7%
f 105840
 
2.3%
Other values (19) 952560
21.0%

Characteristics
Categorical

UNIFORM 

Distinct18
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size827.0 KiB
Male employees
 
5880
Female employees
 
5880
Immigrant employees
 
5880
Non-immigrant employees
 
5880
Indigenous identity employees
 
5880
Other values (13)
76440 

Length

Max length33
Median length28
Mean length19.5
Min length14

Characters and Unicode

Total characters2063880
Distinct characters37
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowMale employees
2nd rowMale employees
3rd rowMale employees
4th rowMale employees
5th rowMale employees

Common Values

ValueCountFrequency (%)
Male employees 5880
 
5.6%
Female employees 5880
 
5.6%
Immigrant employees 5880
 
5.6%
Non-immigrant employees 5880
 
5.6%
Indigenous identity employees 5880
 
5.6%
Non-indigenous identity employees 5880
 
5.6%
Visible minority 5880
 
5.6%
Not a visible minority 5880
 
5.6%
High school diploma and less 5880
 
5.6%
Trade certificate 5880
 
5.6%
Other values (8) 47040
44.4%

Length

2023-11-27T15:37:03.709426image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
employees 35280
 
10.3%
years 35280
 
10.3%
to 29400
 
8.6%
and 17640
 
5.2%
identity 11760
 
3.4%
visible 11760
 
3.4%
minority 11760
 
3.4%
diploma 11760
 
3.4%
male 5880
 
1.7%
34 5880
 
1.7%
Other values (28) 164640
48.3%

Most occurring characters

ValueCountFrequency (%)
e 264600
12.8%
235200
 
11.4%
i 152880
 
7.4%
o 147000
 
7.1%
s 117600
 
5.7%
a 105840
 
5.1%
l 99960
 
4.8%
y 99960
 
4.8%
t 99960
 
4.8%
r 94080
 
4.6%
Other values (27) 646800
31.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1617000
78.3%
Space Separator 235200
 
11.4%
Decimal Number 129360
 
6.3%
Uppercase Letter 70560
 
3.4%
Dash Punctuation 11760
 
0.6%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 264600
16.4%
i 152880
9.5%
o 147000
9.1%
s 117600
 
7.3%
a 105840
 
6.5%
l 99960
 
6.2%
y 99960
 
6.2%
t 99960
 
6.2%
r 94080
 
5.8%
n 94080
 
5.8%
Other values (10) 341040
21.1%
Uppercase Letter
ValueCountFrequency (%)
N 17640
25.0%
I 11760
16.7%
H 5880
 
8.3%
T 5880
 
8.3%
C 5880
 
8.3%
U 5880
 
8.3%
V 5880
 
8.3%
F 5880
 
8.3%
M 5880
 
8.3%
Decimal Number
ValueCountFrequency (%)
5 47040
36.4%
4 41160
31.8%
2 11760
 
9.1%
3 11760
 
9.1%
6 11760
 
9.1%
1 5880
 
4.5%
Space Separator
ValueCountFrequency (%)
235200
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 11760
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 1687560
81.8%
Common 376320
 
18.2%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 264600
15.7%
i 152880
 
9.1%
o 147000
 
8.7%
s 117600
 
7.0%
a 105840
 
6.3%
l 99960
 
5.9%
y 99960
 
5.9%
t 99960
 
5.9%
r 94080
 
5.6%
n 94080
 
5.6%
Other values (19) 411600
24.4%
Common
ValueCountFrequency (%)
235200
62.5%
5 47040
 
12.5%
4 41160
 
10.9%
2 11760
 
3.1%
3 11760
 
3.1%
- 11760
 
3.1%
6 11760
 
3.1%
1 5880
 
1.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2063880
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 264600
12.8%
235200
 
11.4%
i 152880
 
7.4%
o 147000
 
7.1%
s 117600
 
5.7%
a 105840
 
5.1%
l 99960
 
4.8%
y 99960
 
4.8%
t 99960
 
4.8%
r 94080
 
4.6%
Other values (27) 646800
31.3%

Indicators
Categorical

HIGH CORRELATION  UNIFORM 

Distinct7
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size827.0 KiB
Number of jobs
15120 
Hours worked
15120 
Wages and salaries
15120 
Average annual hours worked
15120 
Average weekly hours worked
15120 
Other values (2)
30240 

Length

Max length33
Median length19
Mean length21.428571
Min length12

Characters and Unicode

Total characters2268000
Distinct characters25
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNumber of jobs
2nd rowHours worked
3rd rowWages and salaries
4th rowAverage annual hours worked
5th rowAverage weekly hours worked

Common Values

ValueCountFrequency (%)
Number of jobs 15120
14.3%
Hours worked 15120
14.3%
Wages and salaries 15120
14.3%
Average annual hours worked 15120
14.3%
Average weekly hours worked 15120
14.3%
Average annual wages and salaries 15120
14.3%
Average hourly wage 15120
14.3%

Length

2023-11-27T15:37:03.988611image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-11-27T15:37:04.234518image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
average 60480
16.7%
hours 45360
12.5%
worked 45360
12.5%
wages 30240
8.3%
and 30240
8.3%
salaries 30240
8.3%
annual 30240
8.3%
number 15120
 
4.2%
of 15120
 
4.2%
jobs 15120
 
4.2%
Other values (3) 45360
12.5%

Most occurring characters

ValueCountFrequency (%)
e 287280
12.7%
a 257040
11.3%
257040
11.3%
r 211680
 
9.3%
s 151200
 
6.7%
o 136080
 
6.0%
u 105840
 
4.7%
g 105840
 
4.7%
n 90720
 
4.0%
l 90720
 
4.0%
Other values (15) 574560
25.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1905120
84.0%
Space Separator 257040
 
11.3%
Uppercase Letter 105840
 
4.7%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 287280
15.1%
a 257040
13.5%
r 211680
11.1%
s 151200
 
7.9%
o 136080
 
7.1%
u 105840
 
5.6%
g 105840
 
5.6%
n 90720
 
4.8%
l 90720
 
4.8%
w 90720
 
4.8%
Other values (10) 378000
19.8%
Uppercase Letter
ValueCountFrequency (%)
A 60480
57.1%
W 15120
 
14.3%
H 15120
 
14.3%
N 15120
 
14.3%
Space Separator
ValueCountFrequency (%)
257040
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 2010960
88.7%
Common 257040
 
11.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 287280
14.3%
a 257040
12.8%
r 211680
10.5%
s 151200
 
7.5%
o 136080
 
6.8%
u 105840
 
5.3%
g 105840
 
5.3%
n 90720
 
4.5%
l 90720
 
4.5%
w 90720
 
4.5%
Other values (14) 483840
24.1%
Common
ValueCountFrequency (%)
257040
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2268000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 287280
12.7%
a 257040
11.3%
257040
11.3%
r 211680
 
9.3%
s 151200
 
6.7%
o 136080
 
6.0%
u 105840
 
4.7%
g 105840
 
4.7%
n 90720
 
4.0%
l 90720
 
4.0%
Other values (15) 574560
25.3%

UOM
Categorical

HIGH CORRELATION 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size827.0 KiB
Hours
45360 
Dollars
45360 
Jobs
15120 

Length

Max length7
Median length5
Mean length5.7142857
Min length4

Characters and Unicode

Total characters604800
Distinct characters10
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowJobs
2nd rowHours
3rd rowDollars
4th rowHours
5th rowHours

Common Values

ValueCountFrequency (%)
Hours 45360
42.9%
Dollars 45360
42.9%
Jobs 15120
 
14.3%

Length

2023-11-27T15:37:04.570275image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-11-27T15:37:04.846509image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
hours 45360
42.9%
dollars 45360
42.9%
jobs 15120
 
14.3%

Most occurring characters

ValueCountFrequency (%)
o 105840
17.5%
s 105840
17.5%
r 90720
15.0%
l 90720
15.0%
H 45360
7.5%
u 45360
7.5%
D 45360
7.5%
a 45360
7.5%
J 15120
 
2.5%
b 15120
 
2.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 498960
82.5%
Uppercase Letter 105840
 
17.5%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
o 105840
21.2%
s 105840
21.2%
r 90720
18.2%
l 90720
18.2%
u 45360
9.1%
a 45360
9.1%
b 15120
 
3.0%
Uppercase Letter
ValueCountFrequency (%)
H 45360
42.9%
D 45360
42.9%
J 15120
 
14.3%

Most occurring scripts

ValueCountFrequency (%)
Latin 604800
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
o 105840
17.5%
s 105840
17.5%
r 90720
15.0%
l 90720
15.0%
H 45360
7.5%
u 45360
7.5%
D 45360
7.5%
a 45360
7.5%
J 15120
 
2.5%
b 15120
 
2.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 604800
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
o 105840
17.5%
s 105840
17.5%
r 90720
15.0%
l 90720
15.0%
H 45360
7.5%
u 45360
7.5%
D 45360
7.5%
a 45360
7.5%
J 15120
 
2.5%
b 15120
 
2.5%

SCALAR_FACTOR
Categorical

HIGH CORRELATION 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size827.0 KiB
units
75600 
thousands
15120 
millions
15120 

Length

Max length9
Median length5
Mean length6
Min length5

Characters and Unicode

Total characters635040
Distinct characters11
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowunits
2nd rowthousands
3rd rowmillions
4th rowunits
5th rowunits

Common Values

ValueCountFrequency (%)
units 75600
71.4%
thousands 15120
 
14.3%
millions 15120
 
14.3%

Length

2023-11-27T15:37:05.133405image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-11-27T15:37:05.369362image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
units 75600
71.4%
thousands 15120
 
14.3%
millions 15120
 
14.3%

Most occurring characters

ValueCountFrequency (%)
s 120960
19.0%
n 105840
16.7%
i 105840
16.7%
u 90720
14.3%
t 90720
14.3%
o 30240
 
4.8%
l 30240
 
4.8%
h 15120
 
2.4%
a 15120
 
2.4%
d 15120
 
2.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 635040
100.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
s 120960
19.0%
n 105840
16.7%
i 105840
16.7%
u 90720
14.3%
t 90720
14.3%
o 30240
 
4.8%
l 30240
 
4.8%
h 15120
 
2.4%
a 15120
 
2.4%
d 15120
 
2.4%

Most occurring scripts

ValueCountFrequency (%)
Latin 635040
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
s 120960
19.0%
n 105840
16.7%
i 105840
16.7%
u 90720
14.3%
t 90720
14.3%
o 30240
 
4.8%
l 30240
 
4.8%
h 15120
 
2.4%
a 15120
 
2.4%
d 15120
 
2.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 635040
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
s 120960
19.0%
n 105840
16.7%
i 105840
16.7%
u 90720
14.3%
t 90720
14.3%
o 30240
 
4.8%
l 30240
 
4.8%
h 15120
 
2.4%
a 15120
 
2.4%
d 15120
 
2.4%

VALUE
Real number (ℝ)

MISSING 

Distinct35398
Distinct (%)34.4%
Missing3024
Missing (%)2.9%
Infinite0
Infinite (%)0.0%
Mean26419.543
Minimum0
Maximum3857813
Zeros9
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size827.0 KiB
2023-11-27T15:37:05.621287image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile18.6075
Q132
median1386
Q317660
95-th percentile85576.5
Maximum3857813
Range3857813
Interquartile range (IQR)17628

Descriptive statistics

Standard deviation118041.35
Coefficient of variation (CV)4.4679558
Kurtosis266.93641
Mean26419.543
Median Absolute Deviation (MAD)1358
Skewness13.831207
Sum2.7163518 × 109
Variance1.393376 × 1010
MonotonicityNot monotonic
2023-11-27T15:37:05.956397image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
30 2040
 
1.9%
31 1949
 
1.8%
32 1844
 
1.7%
29 1562
 
1.5%
33 1476
 
1.4%
34 1005
 
0.9%
28 905
 
0.9%
35 638
 
0.6%
27 523
 
0.5%
36 457
 
0.4%
Other values (35388) 90417
85.4%
(Missing) 3024
 
2.9%
ValueCountFrequency (%)
0 9
 
< 0.1%
1 219
0.2%
2 299
0.3%
3 201
0.2%
4 203
0.2%
5 157
0.1%
6 147
0.1%
7 135
0.1%
8 179
0.2%
9 145
0.1%
ValueCountFrequency (%)
3857813 1
< 0.1%
3690238 1
< 0.1%
3643952 1
< 0.1%
3581317 1
< 0.1%
3556730 1
< 0.1%
3544822 1
< 0.1%
3487265 1
< 0.1%
3425218 1
< 0.1%
3402343 1
< 0.1%
3372266 1
< 0.1%

Interactions

2023-11-27T15:37:00.206577image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-11-27T15:36:59.751653image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-11-27T15:37:00.456479image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-11-27T15:36:59.969120image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-11-27T15:37:06.221587image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
REF_DATEVALUEDGUIDGEOSectorCharacteristicsIndicatorsUOMSCALAR_FACTOR
REF_DATE1.0000.0270.0000.0000.0000.0000.0000.0000.000
VALUE0.0271.0000.0900.0900.0510.0440.0750.0810.105
DGUID0.0000.0901.0001.0000.0000.0000.0000.0000.000
GEO0.0000.0901.0001.0000.0000.0000.0000.0000.000
Sector0.0000.0510.0000.0001.0000.0000.0000.0000.000
Characteristics0.0000.0440.0000.0000.0001.0000.0000.0000.000
Indicators0.0000.0750.0000.0000.0000.0001.0001.0001.000
UOM0.0000.0810.0000.0000.0000.0001.0001.0000.447
SCALAR_FACTOR0.0000.1050.0000.0000.0000.0001.0000.4471.000

Missing values

2023-11-27T15:37:00.795954image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-11-27T15:37:01.191143image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

REF_DATEDGUIDGEOSectorCharacteristicsIndicatorsUOMSCALAR_FACTORVALUE
020102016A000011124CanadaTotal non-profit institutionsMale employeesNumber of jobsJobsunits642584.00
120102016A000011124CanadaTotal non-profit institutionsMale employeesHours workedHoursthousands1048516.00
220102016A000011124CanadaTotal non-profit institutionsMale employeesWages and salariesDollarsmillions30805.00
320102016A000011124CanadaTotal non-profit institutionsMale employeesAverage annual hours workedHoursunits1632.00
420102016A000011124CanadaTotal non-profit institutionsMale employeesAverage weekly hours workedHoursunits31.00
520102016A000011124CanadaTotal non-profit institutionsMale employeesAverage annual wages and salariesDollarsunits47940.00
620102016A000011124CanadaTotal non-profit institutionsMale employeesAverage hourly wageDollarsunits29.38
720102016A000011124CanadaTotal non-profit institutionsFemale employeesNumber of jobsJobsunits1500394.00
820102016A000011124CanadaTotal non-profit institutionsFemale employeesHours workedHoursthousands2331018.00
920102016A000011124CanadaTotal non-profit institutionsFemale employeesWages and salariesDollarsmillions60943.00
REF_DATEDGUIDGEOSectorCharacteristicsIndicatorsUOMSCALAR_FACTORVALUE
10583020212016A000262NunavutGovernment non-profit institutions55 to 64 yearsAverage weekly hours workedHoursunits33.00
10583120212016A000262NunavutGovernment non-profit institutions55 to 64 yearsAverage annual wages and salariesDollarsunits101380.00
10583220212016A000262NunavutGovernment non-profit institutions55 to 64 yearsAverage hourly wageDollarsunits59.98
10583320212016A000262NunavutGovernment non-profit institutions65 years old and overNumber of jobsJobsunits27.00
10583420212016A000262NunavutGovernment non-profit institutions65 years old and overHours workedHoursthousands30.00
10583520212016A000262NunavutGovernment non-profit institutions65 years old and overWages and salariesDollarsmillions2.00
10583620212016A000262NunavutGovernment non-profit institutions65 years old and overAverage annual hours workedHoursunits1111.00
10583720212016A000262NunavutGovernment non-profit institutions65 years old and overAverage weekly hours workedHoursunits21.00
10583820212016A000262NunavutGovernment non-profit institutions65 years old and overAverage annual wages and salariesDollarsunits74037.00
10583920212016A000262NunavutGovernment non-profit institutions65 years old and overAverage hourly wageDollarsunits66.63
Pandas Profiling Report with Columns Sorted

Overview

Dataset statistics

Number of variables9
Number of observations105840
Missing cells3024
Missing cells (%)0.3%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory7.3 MiB
Average record size in memory72.0 B

Variable types

Numeric2
Categorical7

Alerts

DGUID is highly overall correlated with GEOHigh correlation
GEO is highly overall correlated with DGUIDHigh correlation
Indicators is highly overall correlated with UOM and 1 other fieldsHigh correlation
UOM is highly overall correlated with IndicatorsHigh correlation
SCALAR_FACTOR is highly overall correlated with IndicatorsHigh correlation
VALUE has 3024 (2.9%) missing valuesMissing
DGUID is uniformly distributedUniform
GEO is uniformly distributedUniform
Sector is uniformly distributedUniform
Characteristics is uniformly distributedUniform
Indicators is uniformly distributedUniform

Reproduction

Analysis started2023-11-28 20:51:58.162085
Analysis finished2023-11-28 20:52:04.287883
Duration6.13 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

REF_DATE
Real number (ℝ)

Distinct12
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2015.5
Minimum2010
Maximum2021
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size827.0 KiB
2023-11-28T15:52:04.408867image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum2010
5-th percentile2010
Q12012.75
median2015.5
Q32018.25
95-th percentile2021
Maximum2021
Range11
Interquartile range (IQR)5.5

Descriptive statistics

Standard deviation3.4520688
Coefficient of variation (CV)0.0017127605
Kurtosis-1.216784
Mean2015.5
Median Absolute Deviation (MAD)3
Skewness0
Sum2.1332052 × 108
Variance11.916779
MonotonicityIncreasing
2023-11-28T15:52:04.619853image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=12)
ValueCountFrequency (%)
2010 8820
8.3%
2011 8820
8.3%
2012 8820
8.3%
2013 8820
8.3%
2014 8820
8.3%
2015 8820
8.3%
2016 8820
8.3%
2017 8820
8.3%
2018 8820
8.3%
2019 8820
8.3%
Other values (2) 17640
16.7%
ValueCountFrequency (%)
2010 8820
8.3%
2011 8820
8.3%
2012 8820
8.3%
2013 8820
8.3%
2014 8820
8.3%
2015 8820
8.3%
2016 8820
8.3%
2017 8820
8.3%
2018 8820
8.3%
2019 8820
8.3%
ValueCountFrequency (%)
2021 8820
8.3%
2020 8820
8.3%
2019 8820
8.3%
2018 8820
8.3%
2017 8820
8.3%
2016 8820
8.3%
2015 8820
8.3%
2014 8820
8.3%
2013 8820
8.3%
2012 8820
8.3%

DGUID
Categorical

HIGH CORRELATION  UNIFORM 

Distinct14
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size827.0 KiB
2016A000011124
7560 
2016A000210
7560 
2016A000211
7560 
2016A000212
7560 
2016A000213
7560 
Other values (9)
68040 

Length

Max length14
Median length11
Mean length11.214286
Min length11

Characters and Unicode

Total characters1186920
Distinct characters11
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2016A000011124
2nd row2016A000011124
3rd row2016A000011124
4th row2016A000011124
5th row2016A000011124

Common Values

ValueCountFrequency (%)
2016A000011124 7560
 
7.1%
2016A000210 7560
 
7.1%
2016A000211 7560
 
7.1%
2016A000212 7560
 
7.1%
2016A000213 7560
 
7.1%
2016A000224 7560
 
7.1%
2016A000235 7560
 
7.1%
2016A000246 7560
 
7.1%
2016A000247 7560
 
7.1%
2016A000248 7560
 
7.1%
Other values (4) 30240
28.6%

Length

2023-11-28T15:52:04.869418image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
2016a000011124 7560
 
7.1%
2016a000210 7560
 
7.1%
2016a000211 7560
 
7.1%
2016a000212 7560
 
7.1%
2016a000213 7560
 
7.1%
2016a000224 7560
 
7.1%
2016a000235 7560
 
7.1%
2016a000246 7560
 
7.1%
2016a000247 7560
 
7.1%
2016a000248 7560
 
7.1%
Other values (4) 30240
28.6%

Most occurring characters

ValueCountFrequency (%)
0 446040
37.6%
2 234360
19.7%
1 173880
 
14.6%
6 136080
 
11.5%
A 105840
 
8.9%
4 37800
 
3.2%
3 15120
 
1.3%
5 15120
 
1.3%
7 7560
 
0.6%
8 7560
 
0.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 1081080
91.1%
Uppercase Letter 105840
 
8.9%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 446040
41.3%
2 234360
21.7%
1 173880
 
16.1%
6 136080
 
12.6%
4 37800
 
3.5%
3 15120
 
1.4%
5 15120
 
1.4%
7 7560
 
0.7%
8 7560
 
0.7%
9 7560
 
0.7%
Uppercase Letter
ValueCountFrequency (%)
A 105840
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 1081080
91.1%
Latin 105840
 
8.9%

Most frequent character per script

Common
ValueCountFrequency (%)
0 446040
41.3%
2 234360
21.7%
1 173880
 
16.1%
6 136080
 
12.6%
4 37800
 
3.5%
3 15120
 
1.4%
5 15120
 
1.4%
7 7560
 
0.7%
8 7560
 
0.7%
9 7560
 
0.7%
Latin
ValueCountFrequency (%)
A 105840
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1186920
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 446040
37.6%
2 234360
19.7%
1 173880
 
14.6%
6 136080
 
11.5%
A 105840
 
8.9%
4 37800
 
3.2%
3 15120
 
1.3%
5 15120
 
1.3%
7 7560
 
0.6%
8 7560
 
0.6%

GEO
Categorical

HIGH CORRELATION  UNIFORM 

Distinct14
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size827.0 KiB
Canada
7560 
Newfoundland and Labrador
7560 
Prince Edward Island
7560 
Nova Scotia
7560 
New Brunswick
7560 
Other values (9)
68040 

Length

Max length25
Median length14.5
Mean length11.714286
Min length5

Characters and Unicode

Total characters1239840
Distinct characters34
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowCanada
2nd rowCanada
3rd rowCanada
4th rowCanada
5th rowCanada

Common Values

ValueCountFrequency (%)
Canada 7560
 
7.1%
Newfoundland and Labrador 7560
 
7.1%
Prince Edward Island 7560
 
7.1%
Nova Scotia 7560
 
7.1%
New Brunswick 7560
 
7.1%
Quebec 7560
 
7.1%
Ontario 7560
 
7.1%
Manitoba 7560
 
7.1%
Saskatchewan 7560
 
7.1%
Alberta 7560
 
7.1%
Other values (4) 30240
28.6%

Length

2023-11-28T15:52:05.176001image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
canada 7560
 
4.5%
newfoundland 7560
 
4.5%
territories 7560
 
4.5%
northwest 7560
 
4.5%
yukon 7560
 
4.5%
columbia 7560
 
4.5%
british 7560
 
4.5%
alberta 7560
 
4.5%
saskatchewan 7560
 
4.5%
manitoba 7560
 
4.5%
Other values (12) 90720
54.5%

Most occurring characters

ValueCountFrequency (%)
a 151200
 
12.2%
n 90720
 
7.3%
r 90720
 
7.3%
t 75600
 
6.1%
e 75600
 
6.1%
o 75600
 
6.1%
i 75600
 
6.1%
d 60480
 
4.9%
60480
 
4.9%
u 52920
 
4.3%
Other values (24) 430920
34.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1020600
82.3%
Uppercase Letter 158760
 
12.8%
Space Separator 60480
 
4.9%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 151200
14.8%
n 90720
 
8.9%
r 90720
 
8.9%
t 75600
 
7.4%
e 75600
 
7.4%
o 75600
 
7.4%
i 75600
 
7.4%
d 60480
 
5.9%
u 52920
 
5.2%
s 45360
 
4.4%
Other values (9) 226800
22.2%
Uppercase Letter
ValueCountFrequency (%)
N 37800
23.8%
B 15120
 
9.5%
S 15120
 
9.5%
C 15120
 
9.5%
I 7560
 
4.8%
E 7560
 
4.8%
P 7560
 
4.8%
L 7560
 
4.8%
Q 7560
 
4.8%
O 7560
 
4.8%
Other values (4) 30240
19.0%
Space Separator
ValueCountFrequency (%)
60480
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 1179360
95.1%
Common 60480
 
4.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 151200
 
12.8%
n 90720
 
7.7%
r 90720
 
7.7%
t 75600
 
6.4%
e 75600
 
6.4%
o 75600
 
6.4%
i 75600
 
6.4%
d 60480
 
5.1%
u 52920
 
4.5%
s 45360
 
3.8%
Other values (23) 385560
32.7%
Common
ValueCountFrequency (%)
60480
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1239840
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a 151200
 
12.2%
n 90720
 
7.3%
r 90720
 
7.3%
t 75600
 
6.1%
e 75600
 
6.1%
o 75600
 
6.1%
i 75600
 
6.1%
d 60480
 
4.9%
60480
 
4.9%
u 52920
 
4.3%
Other values (24) 430920
34.8%

Sector
Categorical

UNIFORM 

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size827.0 KiB
Total non-profit institutions
21168 
Total non-profit institutions excluding governments
21168 
Non-profit institutions serving households (community organizations)
21168 
Business non-profit institutions
21168 
Government non-profit institutions
21168 

Length

Max length68
Median length34
Mean length42.8
Min length29

Characters and Unicode

Total characters4529952
Distinct characters29
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowTotal non-profit institutions
2nd rowTotal non-profit institutions
3rd rowTotal non-profit institutions
4th rowTotal non-profit institutions
5th rowTotal non-profit institutions

Common Values

ValueCountFrequency (%)
Total non-profit institutions 21168
20.0%
Total non-profit institutions excluding governments 21168
20.0%
Non-profit institutions serving households (community organizations) 21168
20.0%
Business non-profit institutions 21168
20.0%
Government non-profit institutions 21168
20.0%

Length

2023-11-28T15:52:05.559606image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-11-28T15:52:05.964840image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
non-profit 105840
25.0%
institutions 105840
25.0%
total 42336
 
10.0%
excluding 21168
 
5.0%
governments 21168
 
5.0%
serving 21168
 
5.0%
households 21168
 
5.0%
community 21168
 
5.0%
organizations 21168
 
5.0%
business 21168
 
5.0%

Most occurring characters

ValueCountFrequency (%)
n 613872
13.6%
i 550368
12.1%
t 550368
12.1%
o 508032
11.2%
s 381024
 
8.4%
317520
 
7.0%
r 190512
 
4.2%
u 190512
 
4.2%
e 169344
 
3.7%
f 105840
 
2.3%
Other values (19) 952560
21.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 3958416
87.4%
Space Separator 317520
 
7.0%
Dash Punctuation 105840
 
2.3%
Uppercase Letter 105840
 
2.3%
Open Punctuation 21168
 
0.5%
Close Punctuation 21168
 
0.5%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
n 613872
15.5%
i 550368
13.9%
t 550368
13.9%
o 508032
12.8%
s 381024
9.6%
r 190512
 
4.8%
u 190512
 
4.8%
e 169344
 
4.3%
f 105840
 
2.7%
p 105840
 
2.7%
Other values (11) 592704
15.0%
Uppercase Letter
ValueCountFrequency (%)
T 42336
40.0%
N 21168
20.0%
B 21168
20.0%
G 21168
20.0%
Space Separator
ValueCountFrequency (%)
317520
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 105840
100.0%
Open Punctuation
ValueCountFrequency (%)
( 21168
100.0%
Close Punctuation
ValueCountFrequency (%)
) 21168
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 4064256
89.7%
Common 465696
 
10.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
n 613872
15.1%
i 550368
13.5%
t 550368
13.5%
o 508032
12.5%
s 381024
9.4%
r 190512
 
4.7%
u 190512
 
4.7%
e 169344
 
4.2%
f 105840
 
2.6%
p 105840
 
2.6%
Other values (15) 698544
17.2%
Common
ValueCountFrequency (%)
317520
68.2%
- 105840
 
22.7%
( 21168
 
4.5%
) 21168
 
4.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 4529952
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
n 613872
13.6%
i 550368
12.1%
t 550368
12.1%
o 508032
11.2%
s 381024
 
8.4%
317520
 
7.0%
r 190512
 
4.2%
u 190512
 
4.2%
e 169344
 
3.7%
f 105840
 
2.3%
Other values (19) 952560
21.0%

Characteristics
Categorical

UNIFORM 

Distinct18
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size827.0 KiB
Male employees
 
5880
Female employees
 
5880
Immigrant employees
 
5880
Non-immigrant employees
 
5880
Indigenous identity employees
 
5880
Other values (13)
76440 

Length

Max length33
Median length28
Mean length19.5
Min length14

Characters and Unicode

Total characters2063880
Distinct characters37
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowMale employees
2nd rowMale employees
3rd rowMale employees
4th rowMale employees
5th rowMale employees

Common Values

ValueCountFrequency (%)
Male employees 5880
 
5.6%
Female employees 5880
 
5.6%
Immigrant employees 5880
 
5.6%
Non-immigrant employees 5880
 
5.6%
Indigenous identity employees 5880
 
5.6%
Non-indigenous identity employees 5880
 
5.6%
Visible minority 5880
 
5.6%
Not a visible minority 5880
 
5.6%
High school diploma and less 5880
 
5.6%
Trade certificate 5880
 
5.6%
Other values (8) 47040
44.4%

Length

2023-11-28T15:52:06.244861image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
employees 35280
 
10.3%
years 35280
 
10.3%
to 29400
 
8.6%
and 17640
 
5.2%
identity 11760
 
3.4%
visible 11760
 
3.4%
minority 11760
 
3.4%
diploma 11760
 
3.4%
male 5880
 
1.7%
34 5880
 
1.7%
Other values (28) 164640
48.3%

Most occurring characters

ValueCountFrequency (%)
e 264600
12.8%
235200
 
11.4%
i 152880
 
7.4%
o 147000
 
7.1%
s 117600
 
5.7%
a 105840
 
5.1%
l 99960
 
4.8%
y 99960
 
4.8%
t 99960
 
4.8%
r 94080
 
4.6%
Other values (27) 646800
31.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1617000
78.3%
Space Separator 235200
 
11.4%
Decimal Number 129360
 
6.3%
Uppercase Letter 70560
 
3.4%
Dash Punctuation 11760
 
0.6%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 264600
16.4%
i 152880
9.5%
o 147000
9.1%
s 117600
 
7.3%
a 105840
 
6.5%
l 99960
 
6.2%
y 99960
 
6.2%
t 99960
 
6.2%
r 94080
 
5.8%
n 94080
 
5.8%
Other values (10) 341040
21.1%
Uppercase Letter
ValueCountFrequency (%)
N 17640
25.0%
I 11760
16.7%
H 5880
 
8.3%
T 5880
 
8.3%
C 5880
 
8.3%
U 5880
 
8.3%
V 5880
 
8.3%
F 5880
 
8.3%
M 5880
 
8.3%
Decimal Number
ValueCountFrequency (%)
5 47040
36.4%
4 41160
31.8%
2 11760
 
9.1%
3 11760
 
9.1%
6 11760
 
9.1%
1 5880
 
4.5%
Space Separator
ValueCountFrequency (%)
235200
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 11760
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 1687560
81.8%
Common 376320
 
18.2%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 264600
15.7%
i 152880
 
9.1%
o 147000
 
8.7%
s 117600
 
7.0%
a 105840
 
6.3%
l 99960
 
5.9%
y 99960
 
5.9%
t 99960
 
5.9%
r 94080
 
5.6%
n 94080
 
5.6%
Other values (19) 411600
24.4%
Common
ValueCountFrequency (%)
235200
62.5%
5 47040
 
12.5%
4 41160
 
10.9%
2 11760
 
3.1%
3 11760
 
3.1%
- 11760
 
3.1%
6 11760
 
3.1%
1 5880
 
1.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2063880
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 264600
12.8%
235200
 
11.4%
i 152880
 
7.4%
o 147000
 
7.1%
s 117600
 
5.7%
a 105840
 
5.1%
l 99960
 
4.8%
y 99960
 
4.8%
t 99960
 
4.8%
r 94080
 
4.6%
Other values (27) 646800
31.3%

Indicators
Categorical

HIGH CORRELATION  UNIFORM 

Distinct7
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size827.0 KiB
Number of jobs
15120 
Hours worked
15120 
Wages and salaries
15120 
Average annual hours worked
15120 
Average weekly hours worked
15120 
Other values (2)
30240 

Length

Max length33
Median length19
Mean length21.428571
Min length12

Characters and Unicode

Total characters2268000
Distinct characters25
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNumber of jobs
2nd rowHours worked
3rd rowWages and salaries
4th rowAverage annual hours worked
5th rowAverage weekly hours worked

Common Values

ValueCountFrequency (%)
Number of jobs 15120
14.3%
Hours worked 15120
14.3%
Wages and salaries 15120
14.3%
Average annual hours worked 15120
14.3%
Average weekly hours worked 15120
14.3%
Average annual wages and salaries 15120
14.3%
Average hourly wage 15120
14.3%

Length

2023-11-28T15:52:06.486830image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-11-28T15:52:06.702145image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
average 60480
16.7%
hours 45360
12.5%
worked 45360
12.5%
wages 30240
8.3%
and 30240
8.3%
salaries 30240
8.3%
annual 30240
8.3%
number 15120
 
4.2%
of 15120
 
4.2%
jobs 15120
 
4.2%
Other values (3) 45360
12.5%

Most occurring characters

ValueCountFrequency (%)
e 287280
12.7%
a 257040
11.3%
257040
11.3%
r 211680
 
9.3%
s 151200
 
6.7%
o 136080
 
6.0%
u 105840
 
4.7%
g 105840
 
4.7%
n 90720
 
4.0%
l 90720
 
4.0%
Other values (15) 574560
25.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1905120
84.0%
Space Separator 257040
 
11.3%
Uppercase Letter 105840
 
4.7%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 287280
15.1%
a 257040
13.5%
r 211680
11.1%
s 151200
 
7.9%
o 136080
 
7.1%
u 105840
 
5.6%
g 105840
 
5.6%
n 90720
 
4.8%
l 90720
 
4.8%
w 90720
 
4.8%
Other values (10) 378000
19.8%
Uppercase Letter
ValueCountFrequency (%)
A 60480
57.1%
W 15120
 
14.3%
H 15120
 
14.3%
N 15120
 
14.3%
Space Separator
ValueCountFrequency (%)
257040
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 2010960
88.7%
Common 257040
 
11.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 287280
14.3%
a 257040
12.8%
r 211680
10.5%
s 151200
 
7.5%
o 136080
 
6.8%
u 105840
 
5.3%
g 105840
 
5.3%
n 90720
 
4.5%
l 90720
 
4.5%
w 90720
 
4.5%
Other values (14) 483840
24.1%
Common
ValueCountFrequency (%)
257040
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2268000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 287280
12.7%
a 257040
11.3%
257040
11.3%
r 211680
 
9.3%
s 151200
 
6.7%
o 136080
 
6.0%
u 105840
 
4.7%
g 105840
 
4.7%
n 90720
 
4.0%
l 90720
 
4.0%
Other values (15) 574560
25.3%

UOM
Categorical

HIGH CORRELATION 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size827.0 KiB
Hours
45360 
Dollars
45360 
Jobs
15120 

Length

Max length7
Median length5
Mean length5.7142857
Min length4

Characters and Unicode

Total characters604800
Distinct characters10
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowJobs
2nd rowHours
3rd rowDollars
4th rowHours
5th rowHours

Common Values

ValueCountFrequency (%)
Hours 45360
42.9%
Dollars 45360
42.9%
Jobs 15120
 
14.3%

Length

2023-11-28T15:52:07.007837image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-11-28T15:52:07.225894image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
hours 45360
42.9%
dollars 45360
42.9%
jobs 15120
 
14.3%

Most occurring characters

ValueCountFrequency (%)
o 105840
17.5%
s 105840
17.5%
r 90720
15.0%
l 90720
15.0%
H 45360
7.5%
u 45360
7.5%
D 45360
7.5%
a 45360
7.5%
J 15120
 
2.5%
b 15120
 
2.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 498960
82.5%
Uppercase Letter 105840
 
17.5%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
o 105840
21.2%
s 105840
21.2%
r 90720
18.2%
l 90720
18.2%
u 45360
9.1%
a 45360
9.1%
b 15120
 
3.0%
Uppercase Letter
ValueCountFrequency (%)
H 45360
42.9%
D 45360
42.9%
J 15120
 
14.3%

Most occurring scripts

ValueCountFrequency (%)
Latin 604800
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
o 105840
17.5%
s 105840
17.5%
r 90720
15.0%
l 90720
15.0%
H 45360
7.5%
u 45360
7.5%
D 45360
7.5%
a 45360
7.5%
J 15120
 
2.5%
b 15120
 
2.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 604800
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
o 105840
17.5%
s 105840
17.5%
r 90720
15.0%
l 90720
15.0%
H 45360
7.5%
u 45360
7.5%
D 45360
7.5%
a 45360
7.5%
J 15120
 
2.5%
b 15120
 
2.5%

SCALAR_FACTOR
Categorical

HIGH CORRELATION 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size827.0 KiB
units
75600 
thousands
15120 
millions
15120 

Length

Max length9
Median length5
Mean length6
Min length5

Characters and Unicode

Total characters635040
Distinct characters11
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowunits
2nd rowthousands
3rd rowmillions
4th rowunits
5th rowunits

Common Values

ValueCountFrequency (%)
units 75600
71.4%
thousands 15120
 
14.3%
millions 15120
 
14.3%

Length

2023-11-28T15:52:07.474302image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-11-28T15:52:07.694623image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
units 75600
71.4%
thousands 15120
 
14.3%
millions 15120
 
14.3%

Most occurring characters

ValueCountFrequency (%)
s 120960
19.0%
n 105840
16.7%
i 105840
16.7%
u 90720
14.3%
t 90720
14.3%
o 30240
 
4.8%
l 30240
 
4.8%
h 15120
 
2.4%
a 15120
 
2.4%
d 15120
 
2.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 635040
100.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
s 120960
19.0%
n 105840
16.7%
i 105840
16.7%
u 90720
14.3%
t 90720
14.3%
o 30240
 
4.8%
l 30240
 
4.8%
h 15120
 
2.4%
a 15120
 
2.4%
d 15120
 
2.4%

Most occurring scripts

ValueCountFrequency (%)
Latin 635040
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
s 120960
19.0%
n 105840
16.7%
i 105840
16.7%
u 90720
14.3%
t 90720
14.3%
o 30240
 
4.8%
l 30240
 
4.8%
h 15120
 
2.4%
a 15120
 
2.4%
d 15120
 
2.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 635040
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
s 120960
19.0%
n 105840
16.7%
i 105840
16.7%
u 90720
14.3%
t 90720
14.3%
o 30240
 
4.8%
l 30240
 
4.8%
h 15120
 
2.4%
a 15120
 
2.4%
d 15120
 
2.4%

VALUE
Real number (ℝ)

MISSING 

Distinct35398
Distinct (%)34.4%
Missing3024
Missing (%)2.9%
Infinite0
Infinite (%)0.0%
Mean26419.543
Minimum0
Maximum3857813
Zeros9
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size827.0 KiB
2023-11-28T15:52:07.927676image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile18.6075
Q132
median1386
Q317660
95-th percentile85576.5
Maximum3857813
Range3857813
Interquartile range (IQR)17628

Descriptive statistics

Standard deviation118041.35
Coefficient of variation (CV)4.4679558
Kurtosis266.93641
Mean26419.543
Median Absolute Deviation (MAD)1358
Skewness13.831207
Sum2.7163518 × 109
Variance1.393376 × 1010
MonotonicityNot monotonic
2023-11-28T15:52:08.226285image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
30 2040
 
1.9%
31 1949
 
1.8%
32 1844
 
1.7%
29 1562
 
1.5%
33 1476
 
1.4%
34 1005
 
0.9%
28 905
 
0.9%
35 638
 
0.6%
27 523
 
0.5%
36 457
 
0.4%
Other values (35388) 90417
85.4%
(Missing) 3024
 
2.9%
ValueCountFrequency (%)
0 9
 
< 0.1%
1 219
0.2%
2 299
0.3%
3 201
0.2%
4 203
0.2%
5 157
0.1%
6 147
0.1%
7 135
0.1%
8 179
0.2%
9 145
0.1%
ValueCountFrequency (%)
3857813 1
< 0.1%
3690238 1
< 0.1%
3643952 1
< 0.1%
3581317 1
< 0.1%
3556730 1
< 0.1%
3544822 1
< 0.1%
3487265 1
< 0.1%
3425218 1
< 0.1%
3402343 1
< 0.1%
3372266 1
< 0.1%

Interactions

2023-11-28T15:52:03.065001image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-11-28T15:52:02.555994image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-11-28T15:52:03.292740image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-11-28T15:52:02.791182image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-11-28T15:52:08.434474image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
REF_DATEVALUEDGUIDGEOSectorCharacteristicsIndicatorsUOMSCALAR_FACTOR
REF_DATE1.0000.0270.0000.0000.0000.0000.0000.0000.000
VALUE0.0271.0000.0900.0900.0510.0440.0750.0810.105
DGUID0.0000.0901.0001.0000.0000.0000.0000.0000.000
GEO0.0000.0901.0001.0000.0000.0000.0000.0000.000
Sector0.0000.0510.0000.0001.0000.0000.0000.0000.000
Characteristics0.0000.0440.0000.0000.0001.0000.0000.0000.000
Indicators0.0000.0750.0000.0000.0000.0001.0001.0001.000
UOM0.0000.0810.0000.0000.0000.0001.0001.0000.447
SCALAR_FACTOR0.0000.1050.0000.0000.0000.0001.0000.4471.000

Missing values

2023-11-28T15:52:03.647426image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-11-28T15:52:04.015429image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

REF_DATEDGUIDGEOSectorCharacteristicsIndicatorsUOMSCALAR_FACTORVALUE
020102016A000011124CanadaTotal non-profit institutionsMale employeesNumber of jobsJobsunits642584.00
120102016A000011124CanadaTotal non-profit institutionsMale employeesHours workedHoursthousands1048516.00
220102016A000011124CanadaTotal non-profit institutionsMale employeesWages and salariesDollarsmillions30805.00
320102016A000011124CanadaTotal non-profit institutionsMale employeesAverage annual hours workedHoursunits1632.00
420102016A000011124CanadaTotal non-profit institutionsMale employeesAverage weekly hours workedHoursunits31.00
520102016A000011124CanadaTotal non-profit institutionsMale employeesAverage annual wages and salariesDollarsunits47940.00
620102016A000011124CanadaTotal non-profit institutionsMale employeesAverage hourly wageDollarsunits29.38
720102016A000011124CanadaTotal non-profit institutionsFemale employeesNumber of jobsJobsunits1500394.00
820102016A000011124CanadaTotal non-profit institutionsFemale employeesHours workedHoursthousands2331018.00
920102016A000011124CanadaTotal non-profit institutionsFemale employeesWages and salariesDollarsmillions60943.00
REF_DATEDGUIDGEOSectorCharacteristicsIndicatorsUOMSCALAR_FACTORVALUE
10583020212016A000262NunavutGovernment non-profit institutions55 to 64 yearsAverage weekly hours workedHoursunits33.00
10583120212016A000262NunavutGovernment non-profit institutions55 to 64 yearsAverage annual wages and salariesDollarsunits101380.00
10583220212016A000262NunavutGovernment non-profit institutions55 to 64 yearsAverage hourly wageDollarsunits59.98
10583320212016A000262NunavutGovernment non-profit institutions65 years old and overNumber of jobsJobsunits27.00
10583420212016A000262NunavutGovernment non-profit institutions65 years old and overHours workedHoursthousands30.00
10583520212016A000262NunavutGovernment non-profit institutions65 years old and overWages and salariesDollarsmillions2.00
10583620212016A000262NunavutGovernment non-profit institutions65 years old and overAverage annual hours workedHoursunits1111.00
10583720212016A000262NunavutGovernment non-profit institutions65 years old and overAverage weekly hours workedHoursunits21.00
10583820212016A000262NunavutGovernment non-profit institutions65 years old and overAverage annual wages and salariesDollarsunits74037.00
10583920212016A000262NunavutGovernment non-profit institutions65 years old and overAverage hourly wageDollarsunits66.63
Pandas Profiling Report with Columns Sorted

Overview

Dataset statistics

Number of variables9
Number of observations105840
Missing cells3024
Missing cells (%)0.3%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory7.3 MiB
Average record size in memory72.0 B

Variable types

Numeric2
Categorical7

Alerts

DGUID is highly overall correlated with GEOHigh correlation
GEO is highly overall correlated with DGUIDHigh correlation
Indicators is highly overall correlated with UOM and 1 other fieldsHigh correlation
UOM is highly overall correlated with IndicatorsHigh correlation
SCALAR_FACTOR is highly overall correlated with IndicatorsHigh correlation
VALUE has 3024 (2.9%) missing valuesMissing
DGUID is uniformly distributedUniform
GEO is uniformly distributedUniform
Sector is uniformly distributedUniform
Characteristics is uniformly distributedUniform
Indicators is uniformly distributedUniform

Reproduction

Analysis started2023-12-20 15:26:34.938709
Analysis finished2023-12-20 15:26:40.785185
Duration5.85 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

REF_DATE
Real number (ℝ)

Distinct12
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2015.5
Minimum2010
Maximum2021
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size827.0 KiB
2023-12-20T10:26:40.895298image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum2010
5-th percentile2010
Q12012.75
median2015.5
Q32018.25
95-th percentile2021
Maximum2021
Range11
Interquartile range (IQR)5.5

Descriptive statistics

Standard deviation3.4520688
Coefficient of variation (CV)0.0017127605
Kurtosis-1.216784
Mean2015.5
Median Absolute Deviation (MAD)3
Skewness0
Sum2.1332052 × 108
Variance11.916779
MonotonicityIncreasing
2023-12-20T10:26:41.111929image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=12)
ValueCountFrequency (%)
2010 8820
8.3%
2011 8820
8.3%
2012 8820
8.3%
2013 8820
8.3%
2014 8820
8.3%
2015 8820
8.3%
2016 8820
8.3%
2017 8820
8.3%
2018 8820
8.3%
2019 8820
8.3%
Other values (2) 17640
16.7%
ValueCountFrequency (%)
2010 8820
8.3%
2011 8820
8.3%
2012 8820
8.3%
2013 8820
8.3%
2014 8820
8.3%
2015 8820
8.3%
2016 8820
8.3%
2017 8820
8.3%
2018 8820
8.3%
2019 8820
8.3%
ValueCountFrequency (%)
2021 8820
8.3%
2020 8820
8.3%
2019 8820
8.3%
2018 8820
8.3%
2017 8820
8.3%
2016 8820
8.3%
2015 8820
8.3%
2014 8820
8.3%
2013 8820
8.3%
2012 8820
8.3%

DGUID
Categorical

HIGH CORRELATION  UNIFORM 

Distinct14
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size827.0 KiB
2016A000011124
7560 
2016A000210
7560 
2016A000211
7560 
2016A000212
7560 
2016A000213
7560 
Other values (9)
68040 

Length

Max length14
Median length11
Mean length11.214286
Min length11

Characters and Unicode

Total characters1186920
Distinct characters11
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2016A000011124
2nd row2016A000011124
3rd row2016A000011124
4th row2016A000011124
5th row2016A000011124

Common Values

ValueCountFrequency (%)
2016A000011124 7560
 
7.1%
2016A000210 7560
 
7.1%
2016A000211 7560
 
7.1%
2016A000212 7560
 
7.1%
2016A000213 7560
 
7.1%
2016A000224 7560
 
7.1%
2016A000235 7560
 
7.1%
2016A000246 7560
 
7.1%
2016A000247 7560
 
7.1%
2016A000248 7560
 
7.1%
Other values (4) 30240
28.6%

Length

2023-12-20T10:26:41.360980image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
2016a000011124 7560
 
7.1%
2016a000210 7560
 
7.1%
2016a000211 7560
 
7.1%
2016a000212 7560
 
7.1%
2016a000213 7560
 
7.1%
2016a000224 7560
 
7.1%
2016a000235 7560
 
7.1%
2016a000246 7560
 
7.1%
2016a000247 7560
 
7.1%
2016a000248 7560
 
7.1%
Other values (4) 30240
28.6%

Most occurring characters

ValueCountFrequency (%)
0 446040
37.6%
2 234360
19.7%
1 173880
 
14.6%
6 136080
 
11.5%
A 105840
 
8.9%
4 37800
 
3.2%
3 15120
 
1.3%
5 15120
 
1.3%
7 7560
 
0.6%
8 7560
 
0.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 1081080
91.1%
Uppercase Letter 105840
 
8.9%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 446040
41.3%
2 234360
21.7%
1 173880
 
16.1%
6 136080
 
12.6%
4 37800
 
3.5%
3 15120
 
1.4%
5 15120
 
1.4%
7 7560
 
0.7%
8 7560
 
0.7%
9 7560
 
0.7%
Uppercase Letter
ValueCountFrequency (%)
A 105840
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 1081080
91.1%
Latin 105840
 
8.9%

Most frequent character per script

Common
ValueCountFrequency (%)
0 446040
41.3%
2 234360
21.7%
1 173880
 
16.1%
6 136080
 
12.6%
4 37800
 
3.5%
3 15120
 
1.4%
5 15120
 
1.4%
7 7560
 
0.7%
8 7560
 
0.7%
9 7560
 
0.7%
Latin
ValueCountFrequency (%)
A 105840
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1186920
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 446040
37.6%
2 234360
19.7%
1 173880
 
14.6%
6 136080
 
11.5%
A 105840
 
8.9%
4 37800
 
3.2%
3 15120
 
1.3%
5 15120
 
1.3%
7 7560
 
0.6%
8 7560
 
0.6%

GEO
Categorical

HIGH CORRELATION  UNIFORM 

Distinct14
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size827.0 KiB
Canada
7560 
Newfoundland and Labrador
7560 
Prince Edward Island
7560 
Nova Scotia
7560 
New Brunswick
7560 
Other values (9)
68040 

Length

Max length25
Median length14.5
Mean length11.714286
Min length5

Characters and Unicode

Total characters1239840
Distinct characters34
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowCanada
2nd rowCanada
3rd rowCanada
4th rowCanada
5th rowCanada

Common Values

ValueCountFrequency (%)
Canada 7560
 
7.1%
Newfoundland and Labrador 7560
 
7.1%
Prince Edward Island 7560
 
7.1%
Nova Scotia 7560
 
7.1%
New Brunswick 7560
 
7.1%
Quebec 7560
 
7.1%
Ontario 7560
 
7.1%
Manitoba 7560
 
7.1%
Saskatchewan 7560
 
7.1%
Alberta 7560
 
7.1%
Other values (4) 30240
28.6%

Length

2023-12-20T10:26:41.651140image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
canada 7560
 
4.5%
newfoundland 7560
 
4.5%
territories 7560
 
4.5%
northwest 7560
 
4.5%
yukon 7560
 
4.5%
columbia 7560
 
4.5%
british 7560
 
4.5%
alberta 7560
 
4.5%
saskatchewan 7560
 
4.5%
manitoba 7560
 
4.5%
Other values (12) 90720
54.5%

Most occurring characters

ValueCountFrequency (%)
a 151200
 
12.2%
n 90720
 
7.3%
r 90720
 
7.3%
t 75600
 
6.1%
e 75600
 
6.1%
o 75600
 
6.1%
i 75600
 
6.1%
d 60480
 
4.9%
60480
 
4.9%
u 52920
 
4.3%
Other values (24) 430920
34.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1020600
82.3%
Uppercase Letter 158760
 
12.8%
Space Separator 60480
 
4.9%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 151200
14.8%
n 90720
 
8.9%
r 90720
 
8.9%
t 75600
 
7.4%
e 75600
 
7.4%
o 75600
 
7.4%
i 75600
 
7.4%
d 60480
 
5.9%
u 52920
 
5.2%
s 45360
 
4.4%
Other values (9) 226800
22.2%
Uppercase Letter
ValueCountFrequency (%)
N 37800
23.8%
B 15120
 
9.5%
S 15120
 
9.5%
C 15120
 
9.5%
I 7560
 
4.8%
E 7560
 
4.8%
P 7560
 
4.8%
L 7560
 
4.8%
Q 7560
 
4.8%
O 7560
 
4.8%
Other values (4) 30240
19.0%
Space Separator
ValueCountFrequency (%)
60480
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 1179360
95.1%
Common 60480
 
4.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 151200
 
12.8%
n 90720
 
7.7%
r 90720
 
7.7%
t 75600
 
6.4%
e 75600
 
6.4%
o 75600
 
6.4%
i 75600
 
6.4%
d 60480
 
5.1%
u 52920
 
4.5%
s 45360
 
3.8%
Other values (23) 385560
32.7%
Common
ValueCountFrequency (%)
60480
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1239840
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a 151200
 
12.2%
n 90720
 
7.3%
r 90720
 
7.3%
t 75600
 
6.1%
e 75600
 
6.1%
o 75600
 
6.1%
i 75600
 
6.1%
d 60480
 
4.9%
60480
 
4.9%
u 52920
 
4.3%
Other values (24) 430920
34.8%

Sector
Categorical

UNIFORM 

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size827.0 KiB
Total non-profit institutions
21168 
Total non-profit institutions excluding governments
21168 
Non-profit institutions serving households (community organizations)
21168 
Business non-profit institutions
21168 
Government non-profit institutions
21168 

Length

Max length68
Median length34
Mean length42.8
Min length29

Characters and Unicode

Total characters4529952
Distinct characters29
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowTotal non-profit institutions
2nd rowTotal non-profit institutions
3rd rowTotal non-profit institutions
4th rowTotal non-profit institutions
5th rowTotal non-profit institutions

Common Values

ValueCountFrequency (%)
Total non-profit institutions 21168
20.0%
Total non-profit institutions excluding governments 21168
20.0%
Non-profit institutions serving households (community organizations) 21168
20.0%
Business non-profit institutions 21168
20.0%
Government non-profit institutions 21168
20.0%

Length

2023-12-20T10:26:41.935292image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-20T10:26:42.177374image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
non-profit 105840
25.0%
institutions 105840
25.0%
total 42336
 
10.0%
excluding 21168
 
5.0%
governments 21168
 
5.0%
serving 21168
 
5.0%
households 21168
 
5.0%
community 21168
 
5.0%
organizations 21168
 
5.0%
business 21168
 
5.0%

Most occurring characters

ValueCountFrequency (%)
n 613872
13.6%
i 550368
12.1%
t 550368
12.1%
o 508032
11.2%
s 381024
 
8.4%
317520
 
7.0%
r 190512
 
4.2%
u 190512
 
4.2%
e 169344
 
3.7%
f 105840
 
2.3%
Other values (19) 952560
21.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 3958416
87.4%
Space Separator 317520
 
7.0%
Dash Punctuation 105840
 
2.3%
Uppercase Letter 105840
 
2.3%
Open Punctuation 21168
 
0.5%
Close Punctuation 21168
 
0.5%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
n 613872
15.5%
i 550368
13.9%
t 550368
13.9%
o 508032
12.8%
s 381024
9.6%
r 190512
 
4.8%
u 190512
 
4.8%
e 169344
 
4.3%
f 105840
 
2.7%
p 105840
 
2.7%
Other values (11) 592704
15.0%
Uppercase Letter
ValueCountFrequency (%)
T 42336
40.0%
N 21168
20.0%
B 21168
20.0%
G 21168
20.0%
Space Separator
ValueCountFrequency (%)
317520
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 105840
100.0%
Open Punctuation
ValueCountFrequency (%)
( 21168
100.0%
Close Punctuation
ValueCountFrequency (%)
) 21168
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 4064256
89.7%
Common 465696
 
10.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
n 613872
15.1%
i 550368
13.5%
t 550368
13.5%
o 508032
12.5%
s 381024
9.4%
r 190512
 
4.7%
u 190512
 
4.7%
e 169344
 
4.2%
f 105840
 
2.6%
p 105840
 
2.6%
Other values (15) 698544
17.2%
Common
ValueCountFrequency (%)
317520
68.2%
- 105840
 
22.7%
( 21168
 
4.5%
) 21168
 
4.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 4529952
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
n 613872
13.6%
i 550368
12.1%
t 550368
12.1%
o 508032
11.2%
s 381024
 
8.4%
317520
 
7.0%
r 190512
 
4.2%
u 190512
 
4.2%
e 169344
 
3.7%
f 105840
 
2.3%
Other values (19) 952560
21.0%

Characteristics
Categorical

UNIFORM 

Distinct18
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size827.0 KiB
Male employees
 
5880
Female employees
 
5880
Immigrant employees
 
5880
Non-immigrant employees
 
5880
Indigenous identity employees
 
5880
Other values (13)
76440 

Length

Max length33
Median length28
Mean length19.5
Min length14

Characters and Unicode

Total characters2063880
Distinct characters37
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowMale employees
2nd rowMale employees
3rd rowMale employees
4th rowMale employees
5th rowMale employees

Common Values

ValueCountFrequency (%)
Male employees 5880
 
5.6%
Female employees 5880
 
5.6%
Immigrant employees 5880
 
5.6%
Non-immigrant employees 5880
 
5.6%
Indigenous identity employees 5880
 
5.6%
Non-indigenous identity employees 5880
 
5.6%
Visible minority 5880
 
5.6%
Not a visible minority 5880
 
5.6%
High school diploma and less 5880
 
5.6%
Trade certificate 5880
 
5.6%
Other values (8) 47040
44.4%

Length

2023-12-20T10:26:42.465322image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
employees 35280
 
10.3%
years 35280
 
10.3%
to 29400
 
8.6%
and 17640
 
5.2%
identity 11760
 
3.4%
visible 11760
 
3.4%
minority 11760
 
3.4%
diploma 11760
 
3.4%
male 5880
 
1.7%
34 5880
 
1.7%
Other values (28) 164640
48.3%

Most occurring characters

ValueCountFrequency (%)
e 264600
12.8%
235200
 
11.4%
i 152880
 
7.4%
o 147000
 
7.1%
s 117600
 
5.7%
a 105840
 
5.1%
l 99960
 
4.8%
y 99960
 
4.8%
t 99960
 
4.8%
r 94080
 
4.6%
Other values (27) 646800
31.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1617000
78.3%
Space Separator 235200
 
11.4%
Decimal Number 129360
 
6.3%
Uppercase Letter 70560
 
3.4%
Dash Punctuation 11760
 
0.6%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 264600
16.4%
i 152880
9.5%
o 147000
9.1%
s 117600
 
7.3%
a 105840
 
6.5%
l 99960
 
6.2%
y 99960
 
6.2%
t 99960
 
6.2%
r 94080
 
5.8%
n 94080
 
5.8%
Other values (10) 341040
21.1%
Uppercase Letter
ValueCountFrequency (%)
N 17640
25.0%
I 11760
16.7%
H 5880
 
8.3%
T 5880
 
8.3%
C 5880
 
8.3%
U 5880
 
8.3%
V 5880
 
8.3%
F 5880
 
8.3%
M 5880
 
8.3%
Decimal Number
ValueCountFrequency (%)
5 47040
36.4%
4 41160
31.8%
2 11760
 
9.1%
3 11760
 
9.1%
6 11760
 
9.1%
1 5880
 
4.5%
Space Separator
ValueCountFrequency (%)
235200
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 11760
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 1687560
81.8%
Common 376320
 
18.2%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 264600
15.7%
i 152880
 
9.1%
o 147000
 
8.7%
s 117600
 
7.0%
a 105840
 
6.3%
l 99960
 
5.9%
y 99960
 
5.9%
t 99960
 
5.9%
r 94080
 
5.6%
n 94080
 
5.6%
Other values (19) 411600
24.4%
Common
ValueCountFrequency (%)
235200
62.5%
5 47040
 
12.5%
4 41160
 
10.9%
2 11760
 
3.1%
3 11760
 
3.1%
- 11760
 
3.1%
6 11760
 
3.1%
1 5880
 
1.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2063880
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 264600
12.8%
235200
 
11.4%
i 152880
 
7.4%
o 147000
 
7.1%
s 117600
 
5.7%
a 105840
 
5.1%
l 99960
 
4.8%
y 99960
 
4.8%
t 99960
 
4.8%
r 94080
 
4.6%
Other values (27) 646800
31.3%

Indicators
Categorical

HIGH CORRELATION  UNIFORM 

Distinct7
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size827.0 KiB
Number of jobs
15120 
Hours worked
15120 
Wages and salaries
15120 
Average annual hours worked
15120 
Average weekly hours worked
15120 
Other values (2)
30240 

Length

Max length33
Median length19
Mean length21.428571
Min length12

Characters and Unicode

Total characters2268000
Distinct characters25
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNumber of jobs
2nd rowHours worked
3rd rowWages and salaries
4th rowAverage annual hours worked
5th rowAverage weekly hours worked

Common Values

ValueCountFrequency (%)
Number of jobs 15120
14.3%
Hours worked 15120
14.3%
Wages and salaries 15120
14.3%
Average annual hours worked 15120
14.3%
Average weekly hours worked 15120
14.3%
Average annual wages and salaries 15120
14.3%
Average hourly wage 15120
14.3%

Length

2023-12-20T10:26:42.699134image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-20T10:26:42.905033image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
average 60480
16.7%
hours 45360
12.5%
worked 45360
12.5%
wages 30240
8.3%
and 30240
8.3%
salaries 30240
8.3%
annual 30240
8.3%
number 15120
 
4.2%
of 15120
 
4.2%
jobs 15120
 
4.2%
Other values (3) 45360
12.5%

Most occurring characters

ValueCountFrequency (%)
e 287280
12.7%
a 257040
11.3%
257040
11.3%
r 211680
 
9.3%
s 151200
 
6.7%
o 136080
 
6.0%
u 105840
 
4.7%
g 105840
 
4.7%
n 90720
 
4.0%
l 90720
 
4.0%
Other values (15) 574560
25.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1905120
84.0%
Space Separator 257040
 
11.3%
Uppercase Letter 105840
 
4.7%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 287280
15.1%
a 257040
13.5%
r 211680
11.1%
s 151200
 
7.9%
o 136080
 
7.1%
u 105840
 
5.6%
g 105840
 
5.6%
n 90720
 
4.8%
l 90720
 
4.8%
w 90720
 
4.8%
Other values (10) 378000
19.8%
Uppercase Letter
ValueCountFrequency (%)
A 60480
57.1%
W 15120
 
14.3%
H 15120
 
14.3%
N 15120
 
14.3%
Space Separator
ValueCountFrequency (%)
257040
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 2010960
88.7%
Common 257040
 
11.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 287280
14.3%
a 257040
12.8%
r 211680
10.5%
s 151200
 
7.5%
o 136080
 
6.8%
u 105840
 
5.3%
g 105840
 
5.3%
n 90720
 
4.5%
l 90720
 
4.5%
w 90720
 
4.5%
Other values (14) 483840
24.1%
Common
ValueCountFrequency (%)
257040
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2268000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 287280
12.7%
a 257040
11.3%
257040
11.3%
r 211680
 
9.3%
s 151200
 
6.7%
o 136080
 
6.0%
u 105840
 
4.7%
g 105840
 
4.7%
n 90720
 
4.0%
l 90720
 
4.0%
Other values (15) 574560
25.3%

UOM
Categorical

HIGH CORRELATION 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size827.0 KiB
Hours
45360 
Dollars
45360 
Jobs
15120 

Length

Max length7
Median length5
Mean length5.7142857
Min length4

Characters and Unicode

Total characters604800
Distinct characters10
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowJobs
2nd rowHours
3rd rowDollars
4th rowHours
5th rowHours

Common Values

ValueCountFrequency (%)
Hours 45360
42.9%
Dollars 45360
42.9%
Jobs 15120
 
14.3%

Length

2023-12-20T10:26:43.395031image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-20T10:26:43.592891image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
hours 45360
42.9%
dollars 45360
42.9%
jobs 15120
 
14.3%

Most occurring characters

ValueCountFrequency (%)
o 105840
17.5%
s 105840
17.5%
r 90720
15.0%
l 90720
15.0%
H 45360
7.5%
u 45360
7.5%
D 45360
7.5%
a 45360
7.5%
J 15120
 
2.5%
b 15120
 
2.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 498960
82.5%
Uppercase Letter 105840
 
17.5%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
o 105840
21.2%
s 105840
21.2%
r 90720
18.2%
l 90720
18.2%
u 45360
9.1%
a 45360
9.1%
b 15120
 
3.0%
Uppercase Letter
ValueCountFrequency (%)
H 45360
42.9%
D 45360
42.9%
J 15120
 
14.3%

Most occurring scripts

ValueCountFrequency (%)
Latin 604800
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
o 105840
17.5%
s 105840
17.5%
r 90720
15.0%
l 90720
15.0%
H 45360
7.5%
u 45360
7.5%
D 45360
7.5%
a 45360
7.5%
J 15120
 
2.5%
b 15120
 
2.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 604800
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
o 105840
17.5%
s 105840
17.5%
r 90720
15.0%
l 90720
15.0%
H 45360
7.5%
u 45360
7.5%
D 45360
7.5%
a 45360
7.5%
J 15120
 
2.5%
b 15120
 
2.5%

SCALAR_FACTOR
Categorical

HIGH CORRELATION 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size827.0 KiB
units
75600 
thousands
15120 
millions
15120 

Length

Max length9
Median length5
Mean length6
Min length5

Characters and Unicode

Total characters635040
Distinct characters11
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowunits
2nd rowthousands
3rd rowmillions
4th rowunits
5th rowunits

Common Values

ValueCountFrequency (%)
units 75600
71.4%
thousands 15120
 
14.3%
millions 15120
 
14.3%

Length

2023-12-20T10:26:43.850057image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-20T10:26:44.062021image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
units 75600
71.4%
thousands 15120
 
14.3%
millions 15120
 
14.3%

Most occurring characters

ValueCountFrequency (%)
s 120960
19.0%
n 105840
16.7%
i 105840
16.7%
u 90720
14.3%
t 90720
14.3%
o 30240
 
4.8%
l 30240
 
4.8%
h 15120
 
2.4%
a 15120
 
2.4%
d 15120
 
2.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 635040
100.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
s 120960
19.0%
n 105840
16.7%
i 105840
16.7%
u 90720
14.3%
t 90720
14.3%
o 30240
 
4.8%
l 30240
 
4.8%
h 15120
 
2.4%
a 15120
 
2.4%
d 15120
 
2.4%

Most occurring scripts

ValueCountFrequency (%)
Latin 635040
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
s 120960
19.0%
n 105840
16.7%
i 105840
16.7%
u 90720
14.3%
t 90720
14.3%
o 30240
 
4.8%
l 30240
 
4.8%
h 15120
 
2.4%
a 15120
 
2.4%
d 15120
 
2.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 635040
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
s 120960
19.0%
n 105840
16.7%
i 105840
16.7%
u 90720
14.3%
t 90720
14.3%
o 30240
 
4.8%
l 30240
 
4.8%
h 15120
 
2.4%
a 15120
 
2.4%
d 15120
 
2.4%

VALUE
Real number (ℝ)

MISSING 

Distinct35398
Distinct (%)34.4%
Missing3024
Missing (%)2.9%
Infinite0
Infinite (%)0.0%
Mean26419.543
Minimum0
Maximum3857813
Zeros9
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size827.0 KiB
2023-12-20T10:26:44.315420image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile18.6075
Q132
median1386
Q317660
95-th percentile85576.5
Maximum3857813
Range3857813
Interquartile range (IQR)17628

Descriptive statistics

Standard deviation118041.35
Coefficient of variation (CV)4.4679558
Kurtosis266.93641
Mean26419.543
Median Absolute Deviation (MAD)1358
Skewness13.831207
Sum2.7163518 × 109
Variance1.393376 × 1010
MonotonicityNot monotonic
2023-12-20T10:26:44.608020image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
30 2040
 
1.9%
31 1949
 
1.8%
32 1844
 
1.7%
29 1562
 
1.5%
33 1476
 
1.4%
34 1005
 
0.9%
28 905
 
0.9%
35 638
 
0.6%
27 523
 
0.5%
36 457
 
0.4%
Other values (35388) 90417
85.4%
(Missing) 3024
 
2.9%
ValueCountFrequency (%)
0 9
 
< 0.1%
1 219
0.2%
2 299
0.3%
3 201
0.2%
4 203
0.2%
5 157
0.1%
6 147
0.1%
7 135
0.1%
8 179
0.2%
9 145
0.1%
ValueCountFrequency (%)
3857813 1
< 0.1%
3690238 1
< 0.1%
3643952 1
< 0.1%
3581317 1
< 0.1%
3556730 1
< 0.1%
3544822 1
< 0.1%
3487265 1
< 0.1%
3425218 1
< 0.1%
3402343 1
< 0.1%
3372266 1
< 0.1%

Interactions

2023-12-20T10:26:39.598814image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-20T10:26:39.170114image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-20T10:26:39.811646image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-20T10:26:39.375292image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-20T10:26:44.847439image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
REF_DATEVALUEDGUIDGEOSectorCharacteristicsIndicatorsUOMSCALAR_FACTOR
REF_DATE1.0000.0270.0000.0000.0000.0000.0000.0000.000
VALUE0.0271.0000.0900.0900.0510.0440.0750.0810.105
DGUID0.0000.0901.0001.0000.0000.0000.0000.0000.000
GEO0.0000.0901.0001.0000.0000.0000.0000.0000.000
Sector0.0000.0510.0000.0001.0000.0000.0000.0000.000
Characteristics0.0000.0440.0000.0000.0001.0000.0000.0000.000
Indicators0.0000.0750.0000.0000.0000.0001.0001.0001.000
UOM0.0000.0810.0000.0000.0000.0001.0001.0000.447
SCALAR_FACTOR0.0000.1050.0000.0000.0000.0001.0000.4471.000

Missing values

2023-12-20T10:26:40.120597image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-20T10:26:40.500103image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

REF_DATEDGUIDGEOSectorCharacteristicsIndicatorsUOMSCALAR_FACTORVALUE
020102016A000011124CanadaTotal non-profit institutionsMale employeesNumber of jobsJobsunits642584.00
120102016A000011124CanadaTotal non-profit institutionsMale employeesHours workedHoursthousands1048516.00
220102016A000011124CanadaTotal non-profit institutionsMale employeesWages and salariesDollarsmillions30805.00
320102016A000011124CanadaTotal non-profit institutionsMale employeesAverage annual hours workedHoursunits1632.00
420102016A000011124CanadaTotal non-profit institutionsMale employeesAverage weekly hours workedHoursunits31.00
520102016A000011124CanadaTotal non-profit institutionsMale employeesAverage annual wages and salariesDollarsunits47940.00
620102016A000011124CanadaTotal non-profit institutionsMale employeesAverage hourly wageDollarsunits29.38
720102016A000011124CanadaTotal non-profit institutionsFemale employeesNumber of jobsJobsunits1500394.00
820102016A000011124CanadaTotal non-profit institutionsFemale employeesHours workedHoursthousands2331018.00
920102016A000011124CanadaTotal non-profit institutionsFemale employeesWages and salariesDollarsmillions60943.00
REF_DATEDGUIDGEOSectorCharacteristicsIndicatorsUOMSCALAR_FACTORVALUE
10583020212016A000262NunavutGovernment non-profit institutions55 to 64 yearsAverage weekly hours workedHoursunits33.00
10583120212016A000262NunavutGovernment non-profit institutions55 to 64 yearsAverage annual wages and salariesDollarsunits101380.00
10583220212016A000262NunavutGovernment non-profit institutions55 to 64 yearsAverage hourly wageDollarsunits59.98
10583320212016A000262NunavutGovernment non-profit institutions65 years old and overNumber of jobsJobsunits27.00
10583420212016A000262NunavutGovernment non-profit institutions65 years old and overHours workedHoursthousands30.00
10583520212016A000262NunavutGovernment non-profit institutions65 years old and overWages and salariesDollarsmillions2.00
10583620212016A000262NunavutGovernment non-profit institutions65 years old and overAverage annual hours workedHoursunits1111.00
10583720212016A000262NunavutGovernment non-profit institutions65 years old and overAverage weekly hours workedHoursunits21.00
10583820212016A000262NunavutGovernment non-profit institutions65 years old and overAverage annual wages and salariesDollarsunits74037.00
10583920212016A000262NunavutGovernment non-profit institutions65 years old and overAverage hourly wageDollarsunits66.63
Pandas Profiling Report with Columns Sorted

Overview

Dataset statistics

Number of variables9
Number of observations105840
Missing cells3024
Missing cells (%)0.3%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory7.3 MiB
Average record size in memory72.0 B

Variable types

Numeric2
Categorical7

Alerts

DGUID is highly overall correlated with GEOHigh correlation
GEO is highly overall correlated with DGUIDHigh correlation
Indicators is highly overall correlated with UOM and 1 other fieldsHigh correlation
UOM is highly overall correlated with IndicatorsHigh correlation
SCALAR_FACTOR is highly overall correlated with IndicatorsHigh correlation
VALUE has 3024 (2.9%) missing valuesMissing
DGUID is uniformly distributedUniform
GEO is uniformly distributedUniform
Sector is uniformly distributedUniform
Characteristics is uniformly distributedUniform
Indicators is uniformly distributedUniform

Reproduction

Analysis started2023-12-20 15:45:16.412286
Analysis finished2023-12-20 15:45:21.448233
Duration5.04 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

REF_DATE
Real number (ℝ)

Distinct12
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2015.5
Minimum2010
Maximum2021
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size827.0 KiB
2023-12-20T10:45:21.580359image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum2010
5-th percentile2010
Q12012.75
median2015.5
Q32018.25
95-th percentile2021
Maximum2021
Range11
Interquartile range (IQR)5.5

Descriptive statistics

Standard deviation3.4520688
Coefficient of variation (CV)0.0017127605
Kurtosis-1.216784
Mean2015.5
Median Absolute Deviation (MAD)3
Skewness0
Sum2.1332052 × 108
Variance11.916779
MonotonicityIncreasing
2023-12-20T10:45:21.830552image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=12)
ValueCountFrequency (%)
2010 8820
8.3%
2011 8820
8.3%
2012 8820
8.3%
2013 8820
8.3%
2014 8820
8.3%
2015 8820
8.3%
2016 8820
8.3%
2017 8820
8.3%
2018 8820
8.3%
2019 8820
8.3%
Other values (2) 17640
16.7%
ValueCountFrequency (%)
2010 8820
8.3%
2011 8820
8.3%
2012 8820
8.3%
2013 8820
8.3%
2014 8820
8.3%
2015 8820
8.3%
2016 8820
8.3%
2017 8820
8.3%
2018 8820
8.3%
2019 8820
8.3%
ValueCountFrequency (%)
2021 8820
8.3%
2020 8820
8.3%
2019 8820
8.3%
2018 8820
8.3%
2017 8820
8.3%
2016 8820
8.3%
2015 8820
8.3%
2014 8820
8.3%
2013 8820
8.3%
2012 8820
8.3%

DGUID
Categorical

HIGH CORRELATION  UNIFORM 

Distinct14
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size827.0 KiB
2016A000011124
7560 
2016A000210
7560 
2016A000211
7560 
2016A000212
7560 
2016A000213
7560 
Other values (9)
68040 

Length

Max length14
Median length11
Mean length11.214286
Min length11

Characters and Unicode

Total characters1186920
Distinct characters11
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2016A000011124
2nd row2016A000011124
3rd row2016A000011124
4th row2016A000011124
5th row2016A000011124

Common Values

ValueCountFrequency (%)
2016A000011124 7560
 
7.1%
2016A000210 7560
 
7.1%
2016A000211 7560
 
7.1%
2016A000212 7560
 
7.1%
2016A000213 7560
 
7.1%
2016A000224 7560
 
7.1%
2016A000235 7560
 
7.1%
2016A000246 7560
 
7.1%
2016A000247 7560
 
7.1%
2016A000248 7560
 
7.1%
Other values (4) 30240
28.6%

Length

2023-12-20T10:45:22.068480image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
2016a000011124 7560
 
7.1%
2016a000210 7560
 
7.1%
2016a000211 7560
 
7.1%
2016a000212 7560
 
7.1%
2016a000213 7560
 
7.1%
2016a000224 7560
 
7.1%
2016a000235 7560
 
7.1%
2016a000246 7560
 
7.1%
2016a000247 7560
 
7.1%
2016a000248 7560
 
7.1%
Other values (4) 30240
28.6%

Most occurring characters

ValueCountFrequency (%)
0 446040
37.6%
2 234360
19.7%
1 173880
 
14.6%
6 136080
 
11.5%
A 105840
 
8.9%
4 37800
 
3.2%
3 15120
 
1.3%
5 15120
 
1.3%
7 7560
 
0.6%
8 7560
 
0.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 1081080
91.1%
Uppercase Letter 105840
 
8.9%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 446040
41.3%
2 234360
21.7%
1 173880
 
16.1%
6 136080
 
12.6%
4 37800
 
3.5%
3 15120
 
1.4%
5 15120
 
1.4%
7 7560
 
0.7%
8 7560
 
0.7%
9 7560
 
0.7%
Uppercase Letter
ValueCountFrequency (%)
A 105840
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 1081080
91.1%
Latin 105840
 
8.9%

Most frequent character per script

Common
ValueCountFrequency (%)
0 446040
41.3%
2 234360
21.7%
1 173880
 
16.1%
6 136080
 
12.6%
4 37800
 
3.5%
3 15120
 
1.4%
5 15120
 
1.4%
7 7560
 
0.7%
8 7560
 
0.7%
9 7560
 
0.7%
Latin
ValueCountFrequency (%)
A 105840
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1186920
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 446040
37.6%
2 234360
19.7%
1 173880
 
14.6%
6 136080
 
11.5%
A 105840
 
8.9%
4 37800
 
3.2%
3 15120
 
1.3%
5 15120
 
1.3%
7 7560
 
0.6%
8 7560
 
0.6%

GEO
Categorical

HIGH CORRELATION  UNIFORM 

Distinct14
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size827.0 KiB
Canada
7560 
Newfoundland and Labrador
7560 
Prince Edward Island
7560 
Nova Scotia
7560 
New Brunswick
7560 
Other values (9)
68040 

Length

Max length25
Median length14.5
Mean length11.714286
Min length5

Characters and Unicode

Total characters1239840
Distinct characters34
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowCanada
2nd rowCanada
3rd rowCanada
4th rowCanada
5th rowCanada

Common Values

ValueCountFrequency (%)
Canada 7560
 
7.1%
Newfoundland and Labrador 7560
 
7.1%
Prince Edward Island 7560
 
7.1%
Nova Scotia 7560
 
7.1%
New Brunswick 7560
 
7.1%
Quebec 7560
 
7.1%
Ontario 7560
 
7.1%
Manitoba 7560
 
7.1%
Saskatchewan 7560
 
7.1%
Alberta 7560
 
7.1%
Other values (4) 30240
28.6%

Length

2023-12-20T10:45:22.364224image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
canada 7560
 
4.5%
newfoundland 7560
 
4.5%
territories 7560
 
4.5%
northwest 7560
 
4.5%
yukon 7560
 
4.5%
columbia 7560
 
4.5%
british 7560
 
4.5%
alberta 7560
 
4.5%
saskatchewan 7560
 
4.5%
manitoba 7560
 
4.5%
Other values (12) 90720
54.5%

Most occurring characters

ValueCountFrequency (%)
a 151200
 
12.2%
n 90720
 
7.3%
r 90720
 
7.3%
t 75600
 
6.1%
e 75600
 
6.1%
o 75600
 
6.1%
i 75600
 
6.1%
d 60480
 
4.9%
60480
 
4.9%
u 52920
 
4.3%
Other values (24) 430920
34.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1020600
82.3%
Uppercase Letter 158760
 
12.8%
Space Separator 60480
 
4.9%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 151200
14.8%
n 90720
 
8.9%
r 90720
 
8.9%
t 75600
 
7.4%
e 75600
 
7.4%
o 75600
 
7.4%
i 75600
 
7.4%
d 60480
 
5.9%
u 52920
 
5.2%
s 45360
 
4.4%
Other values (9) 226800
22.2%
Uppercase Letter
ValueCountFrequency (%)
N 37800
23.8%
B 15120
 
9.5%
S 15120
 
9.5%
C 15120
 
9.5%
I 7560
 
4.8%
E 7560
 
4.8%
P 7560
 
4.8%
L 7560
 
4.8%
Q 7560
 
4.8%
O 7560
 
4.8%
Other values (4) 30240
19.0%
Space Separator
ValueCountFrequency (%)
60480
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 1179360
95.1%
Common 60480
 
4.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 151200
 
12.8%
n 90720
 
7.7%
r 90720
 
7.7%
t 75600
 
6.4%
e 75600
 
6.4%
o 75600
 
6.4%
i 75600
 
6.4%
d 60480
 
5.1%
u 52920
 
4.5%
s 45360
 
3.8%
Other values (23) 385560
32.7%
Common
ValueCountFrequency (%)
60480
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1239840
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a 151200
 
12.2%
n 90720
 
7.3%
r 90720
 
7.3%
t 75600
 
6.1%
e 75600
 
6.1%
o 75600
 
6.1%
i 75600
 
6.1%
d 60480
 
4.9%
60480
 
4.9%
u 52920
 
4.3%
Other values (24) 430920
34.8%

Sector
Categorical

UNIFORM 

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size827.0 KiB
Total non-profit institutions
21168 
Total non-profit institutions excluding governments
21168 
Non-profit institutions serving households (community organizations)
21168 
Business non-profit institutions
21168 
Government non-profit institutions
21168 

Length

Max length68
Median length34
Mean length42.8
Min length29

Characters and Unicode

Total characters4529952
Distinct characters29
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowTotal non-profit institutions
2nd rowTotal non-profit institutions
3rd rowTotal non-profit institutions
4th rowTotal non-profit institutions
5th rowTotal non-profit institutions

Common Values

ValueCountFrequency (%)
Total non-profit institutions 21168
20.0%
Total non-profit institutions excluding governments 21168
20.0%
Non-profit institutions serving households (community organizations) 21168
20.0%
Business non-profit institutions 21168
20.0%
Government non-profit institutions 21168
20.0%

Length

2023-12-20T10:45:22.658512image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-20T10:45:23.092351image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
non-profit 105840
25.0%
institutions 105840
25.0%
total 42336
 
10.0%
excluding 21168
 
5.0%
governments 21168
 
5.0%
serving 21168
 
5.0%
households 21168
 
5.0%
community 21168
 
5.0%
organizations 21168
 
5.0%
business 21168
 
5.0%

Most occurring characters

ValueCountFrequency (%)
n 613872
13.6%
i 550368
12.1%
t 550368
12.1%
o 508032
11.2%
s 381024
 
8.4%
317520
 
7.0%
r 190512
 
4.2%
u 190512
 
4.2%
e 169344
 
3.7%
f 105840
 
2.3%
Other values (19) 952560
21.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 3958416
87.4%
Space Separator 317520
 
7.0%
Dash Punctuation 105840
 
2.3%
Uppercase Letter 105840
 
2.3%
Open Punctuation 21168
 
0.5%
Close Punctuation 21168
 
0.5%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
n 613872
15.5%
i 550368
13.9%
t 550368
13.9%
o 508032
12.8%
s 381024
9.6%
r 190512
 
4.8%
u 190512
 
4.8%
e 169344
 
4.3%
f 105840
 
2.7%
p 105840
 
2.7%
Other values (11) 592704
15.0%
Uppercase Letter
ValueCountFrequency (%)
T 42336
40.0%
N 21168
20.0%
B 21168
20.0%
G 21168
20.0%
Space Separator
ValueCountFrequency (%)
317520
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 105840
100.0%
Open Punctuation
ValueCountFrequency (%)
( 21168
100.0%
Close Punctuation
ValueCountFrequency (%)
) 21168
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 4064256
89.7%
Common 465696
 
10.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
n 613872
15.1%
i 550368
13.5%
t 550368
13.5%
o 508032
12.5%
s 381024
9.4%
r 190512
 
4.7%
u 190512
 
4.7%
e 169344
 
4.2%
f 105840
 
2.6%
p 105840
 
2.6%
Other values (15) 698544
17.2%
Common
ValueCountFrequency (%)
317520
68.2%
- 105840
 
22.7%
( 21168
 
4.5%
) 21168
 
4.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 4529952
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
n 613872
13.6%
i 550368
12.1%
t 550368
12.1%
o 508032
11.2%
s 381024
 
8.4%
317520
 
7.0%
r 190512
 
4.2%
u 190512
 
4.2%
e 169344
 
3.7%
f 105840
 
2.3%
Other values (19) 952560
21.0%

Characteristics
Categorical

UNIFORM 

Distinct18
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size827.0 KiB
Male employees
 
5880
Female employees
 
5880
Immigrant employees
 
5880
Non-immigrant employees
 
5880
Indigenous identity employees
 
5880
Other values (13)
76440 

Length

Max length33
Median length28
Mean length19.5
Min length14

Characters and Unicode

Total characters2063880
Distinct characters37
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowMale employees
2nd rowMale employees
3rd rowMale employees
4th rowMale employees
5th rowMale employees

Common Values

ValueCountFrequency (%)
Male employees 5880
 
5.6%
Female employees 5880
 
5.6%
Immigrant employees 5880
 
5.6%
Non-immigrant employees 5880
 
5.6%
Indigenous identity employees 5880
 
5.6%
Non-indigenous identity employees 5880
 
5.6%
Visible minority 5880
 
5.6%
Not a visible minority 5880
 
5.6%
High school diploma and less 5880
 
5.6%
Trade certificate 5880
 
5.6%
Other values (8) 47040
44.4%

Length

2023-12-20T10:45:23.386688image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
employees 35280
 
10.3%
years 35280
 
10.3%
to 29400
 
8.6%
and 17640
 
5.2%
identity 11760
 
3.4%
visible 11760
 
3.4%
minority 11760
 
3.4%
diploma 11760
 
3.4%
male 5880
 
1.7%
34 5880
 
1.7%
Other values (28) 164640
48.3%

Most occurring characters

ValueCountFrequency (%)
e 264600
12.8%
235200
 
11.4%
i 152880
 
7.4%
o 147000
 
7.1%
s 117600
 
5.7%
a 105840
 
5.1%
l 99960
 
4.8%
y 99960
 
4.8%
t 99960
 
4.8%
r 94080
 
4.6%
Other values (27) 646800
31.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1617000
78.3%
Space Separator 235200
 
11.4%
Decimal Number 129360
 
6.3%
Uppercase Letter 70560
 
3.4%
Dash Punctuation 11760
 
0.6%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 264600
16.4%
i 152880
9.5%
o 147000
9.1%
s 117600
 
7.3%
a 105840
 
6.5%
l 99960
 
6.2%
y 99960
 
6.2%
t 99960
 
6.2%
r 94080
 
5.8%
n 94080
 
5.8%
Other values (10) 341040
21.1%
Uppercase Letter
ValueCountFrequency (%)
N 17640
25.0%
I 11760
16.7%
H 5880
 
8.3%
T 5880
 
8.3%
C 5880
 
8.3%
U 5880
 
8.3%
V 5880
 
8.3%
F 5880
 
8.3%
M 5880
 
8.3%
Decimal Number
ValueCountFrequency (%)
5 47040
36.4%
4 41160
31.8%
2 11760
 
9.1%
3 11760
 
9.1%
6 11760
 
9.1%
1 5880
 
4.5%
Space Separator
ValueCountFrequency (%)
235200
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 11760
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 1687560
81.8%
Common 376320
 
18.2%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 264600
15.7%
i 152880
 
9.1%
o 147000
 
8.7%
s 117600
 
7.0%
a 105840
 
6.3%
l 99960
 
5.9%
y 99960
 
5.9%
t 99960
 
5.9%
r 94080
 
5.6%
n 94080
 
5.6%
Other values (19) 411600
24.4%
Common
ValueCountFrequency (%)
235200
62.5%
5 47040
 
12.5%
4 41160
 
10.9%
2 11760
 
3.1%
3 11760
 
3.1%
- 11760
 
3.1%
6 11760
 
3.1%
1 5880
 
1.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2063880
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 264600
12.8%
235200
 
11.4%
i 152880
 
7.4%
o 147000
 
7.1%
s 117600
 
5.7%
a 105840
 
5.1%
l 99960
 
4.8%
y 99960
 
4.8%
t 99960
 
4.8%
r 94080
 
4.6%
Other values (27) 646800
31.3%

Indicators
Categorical

HIGH CORRELATION  UNIFORM 

Distinct7
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size827.0 KiB
Number of jobs
15120 
Hours worked
15120 
Wages and salaries
15120 
Average annual hours worked
15120 
Average weekly hours worked
15120 
Other values (2)
30240 

Length

Max length33
Median length19
Mean length21.428571
Min length12

Characters and Unicode

Total characters2268000
Distinct characters25
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNumber of jobs
2nd rowHours worked
3rd rowWages and salaries
4th rowAverage annual hours worked
5th rowAverage weekly hours worked

Common Values

ValueCountFrequency (%)
Number of jobs 15120
14.3%
Hours worked 15120
14.3%
Wages and salaries 15120
14.3%
Average annual hours worked 15120
14.3%
Average weekly hours worked 15120
14.3%
Average annual wages and salaries 15120
14.3%
Average hourly wage 15120
14.3%

Length

2023-12-20T10:45:23.646906image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-20T10:45:23.918591image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
average 60480
16.7%
hours 45360
12.5%
worked 45360
12.5%
wages 30240
8.3%
and 30240
8.3%
salaries 30240
8.3%
annual 30240
8.3%
number 15120
 
4.2%
of 15120
 
4.2%
jobs 15120
 
4.2%
Other values (3) 45360
12.5%

Most occurring characters

ValueCountFrequency (%)
e 287280
12.7%
a 257040
11.3%
257040
11.3%
r 211680
 
9.3%
s 151200
 
6.7%
o 136080
 
6.0%
u 105840
 
4.7%
g 105840
 
4.7%
n 90720
 
4.0%
l 90720
 
4.0%
Other values (15) 574560
25.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1905120
84.0%
Space Separator 257040
 
11.3%
Uppercase Letter 105840
 
4.7%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 287280
15.1%
a 257040
13.5%
r 211680
11.1%
s 151200
 
7.9%
o 136080
 
7.1%
u 105840
 
5.6%
g 105840
 
5.6%
n 90720
 
4.8%
l 90720
 
4.8%
w 90720
 
4.8%
Other values (10) 378000
19.8%
Uppercase Letter
ValueCountFrequency (%)
A 60480
57.1%
W 15120
 
14.3%
H 15120
 
14.3%
N 15120
 
14.3%
Space Separator
ValueCountFrequency (%)
257040
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 2010960
88.7%
Common 257040
 
11.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 287280
14.3%
a 257040
12.8%
r 211680
10.5%
s 151200
 
7.5%
o 136080
 
6.8%
u 105840
 
5.3%
g 105840
 
5.3%
n 90720
 
4.5%
l 90720
 
4.5%
w 90720
 
4.5%
Other values (14) 483840
24.1%
Common
ValueCountFrequency (%)
257040
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2268000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 287280
12.7%
a 257040
11.3%
257040
11.3%
r 211680
 
9.3%
s 151200
 
6.7%
o 136080
 
6.0%
u 105840
 
4.7%
g 105840
 
4.7%
n 90720
 
4.0%
l 90720
 
4.0%
Other values (15) 574560
25.3%

UOM
Categorical

HIGH CORRELATION 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size827.0 KiB
Hours
45360 
Dollars
45360 
Jobs
15120 

Length

Max length7
Median length5
Mean length5.7142857
Min length4

Characters and Unicode

Total characters604800
Distinct characters10
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowJobs
2nd rowHours
3rd rowDollars
4th rowHours
5th rowHours

Common Values

ValueCountFrequency (%)
Hours 45360
42.9%
Dollars 45360
42.9%
Jobs 15120
 
14.3%

Length

2023-12-20T10:45:24.204934image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-20T10:45:24.405112image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
hours 45360
42.9%
dollars 45360
42.9%
jobs 15120
 
14.3%

Most occurring characters

ValueCountFrequency (%)
o 105840
17.5%
s 105840
17.5%
r 90720
15.0%
l 90720
15.0%
H 45360
7.5%
u 45360
7.5%
D 45360
7.5%
a 45360
7.5%
J 15120
 
2.5%
b 15120
 
2.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 498960
82.5%
Uppercase Letter 105840
 
17.5%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
o 105840
21.2%
s 105840
21.2%
r 90720
18.2%
l 90720
18.2%
u 45360
9.1%
a 45360
9.1%
b 15120
 
3.0%
Uppercase Letter
ValueCountFrequency (%)
H 45360
42.9%
D 45360
42.9%
J 15120
 
14.3%

Most occurring scripts

ValueCountFrequency (%)
Latin 604800
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
o 105840
17.5%
s 105840
17.5%
r 90720
15.0%
l 90720
15.0%
H 45360
7.5%
u 45360
7.5%
D 45360
7.5%
a 45360
7.5%
J 15120
 
2.5%
b 15120
 
2.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 604800
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
o 105840
17.5%
s 105840
17.5%
r 90720
15.0%
l 90720
15.0%
H 45360
7.5%
u 45360
7.5%
D 45360
7.5%
a 45360
7.5%
J 15120
 
2.5%
b 15120
 
2.5%

SCALAR_FACTOR
Categorical

HIGH CORRELATION 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size827.0 KiB
units
75600 
thousands
15120 
millions
15120 

Length

Max length9
Median length5
Mean length6
Min length5

Characters and Unicode

Total characters635040
Distinct characters11
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowunits
2nd rowthousands
3rd rowmillions
4th rowunits
5th rowunits

Common Values

ValueCountFrequency (%)
units 75600
71.4%
thousands 15120
 
14.3%
millions 15120
 
14.3%

Length

2023-12-20T10:45:24.661633image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-20T10:45:24.894575image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
units 75600
71.4%
thousands 15120
 
14.3%
millions 15120
 
14.3%

Most occurring characters

ValueCountFrequency (%)
s 120960
19.0%
n 105840
16.7%
i 105840
16.7%
u 90720
14.3%
t 90720
14.3%
o 30240
 
4.8%
l 30240
 
4.8%
h 15120
 
2.4%
a 15120
 
2.4%
d 15120
 
2.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 635040
100.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
s 120960
19.0%
n 105840
16.7%
i 105840
16.7%
u 90720
14.3%
t 90720
14.3%
o 30240
 
4.8%
l 30240
 
4.8%
h 15120
 
2.4%
a 15120
 
2.4%
d 15120
 
2.4%

Most occurring scripts

ValueCountFrequency (%)
Latin 635040
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
s 120960
19.0%
n 105840
16.7%
i 105840
16.7%
u 90720
14.3%
t 90720
14.3%
o 30240
 
4.8%
l 30240
 
4.8%
h 15120
 
2.4%
a 15120
 
2.4%
d 15120
 
2.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 635040
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
s 120960
19.0%
n 105840
16.7%
i 105840
16.7%
u 90720
14.3%
t 90720
14.3%
o 30240
 
4.8%
l 30240
 
4.8%
h 15120
 
2.4%
a 15120
 
2.4%
d 15120
 
2.4%

VALUE
Real number (ℝ)

MISSING 

Distinct35398
Distinct (%)34.4%
Missing3024
Missing (%)2.9%
Infinite0
Infinite (%)0.0%
Mean26419.543
Minimum0
Maximum3857813
Zeros9
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size827.0 KiB
2023-12-20T10:45:25.131715image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile18.6075
Q132
median1386
Q317660
95-th percentile85576.5
Maximum3857813
Range3857813
Interquartile range (IQR)17628

Descriptive statistics

Standard deviation118041.35
Coefficient of variation (CV)4.4679558
Kurtosis266.93641
Mean26419.543
Median Absolute Deviation (MAD)1358
Skewness13.831207
Sum2.7163518 × 109
Variance1.393376 × 1010
MonotonicityNot monotonic
2023-12-20T10:45:25.458673image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
30 2040
 
1.9%
31 1949
 
1.8%
32 1844
 
1.7%
29 1562
 
1.5%
33 1476
 
1.4%
34 1005
 
0.9%
28 905
 
0.9%
35 638
 
0.6%
27 523
 
0.5%
36 457
 
0.4%
Other values (35388) 90417
85.4%
(Missing) 3024
 
2.9%
ValueCountFrequency (%)
0 9
 
< 0.1%
1 219
0.2%
2 299
0.3%
3 201
0.2%
4 203
0.2%
5 157
0.1%
6 147
0.1%
7 135
0.1%
8 179
0.2%
9 145
0.1%
ValueCountFrequency (%)
3857813 1
< 0.1%
3690238 1
< 0.1%
3643952 1
< 0.1%
3581317 1
< 0.1%
3556730 1
< 0.1%
3544822 1
< 0.1%
3487265 1
< 0.1%
3425218 1
< 0.1%
3402343 1
< 0.1%
3372266 1
< 0.1%

Interactions

2023-12-20T10:45:20.212052image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-20T10:45:19.690677image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-20T10:45:20.444501image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-20T10:45:19.930663image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-20T10:45:25.684303image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
REF_DATEVALUEDGUIDGEOSectorCharacteristicsIndicatorsUOMSCALAR_FACTOR
REF_DATE1.0000.0270.0000.0000.0000.0000.0000.0000.000
VALUE0.0271.0000.0900.0900.0510.0440.0750.0810.105
DGUID0.0000.0901.0001.0000.0000.0000.0000.0000.000
GEO0.0000.0901.0001.0000.0000.0000.0000.0000.000
Sector0.0000.0510.0000.0001.0000.0000.0000.0000.000
Characteristics0.0000.0440.0000.0000.0001.0000.0000.0000.000
Indicators0.0000.0750.0000.0000.0000.0001.0001.0001.000
UOM0.0000.0810.0000.0000.0000.0001.0001.0000.447
SCALAR_FACTOR0.0000.1050.0000.0000.0000.0001.0000.4471.000

Missing values

2023-12-20T10:45:20.779087image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-20T10:45:21.136232image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

REF_DATEDGUIDGEOSectorCharacteristicsIndicatorsUOMSCALAR_FACTORVALUE
020102016A000011124CanadaTotal non-profit institutionsMale employeesNumber of jobsJobsunits642584.00
120102016A000011124CanadaTotal non-profit institutionsMale employeesHours workedHoursthousands1048516.00
220102016A000011124CanadaTotal non-profit institutionsMale employeesWages and salariesDollarsmillions30805.00
320102016A000011124CanadaTotal non-profit institutionsMale employeesAverage annual hours workedHoursunits1632.00
420102016A000011124CanadaTotal non-profit institutionsMale employeesAverage weekly hours workedHoursunits31.00
520102016A000011124CanadaTotal non-profit institutionsMale employeesAverage annual wages and salariesDollarsunits47940.00
620102016A000011124CanadaTotal non-profit institutionsMale employeesAverage hourly wageDollarsunits29.38
720102016A000011124CanadaTotal non-profit institutionsFemale employeesNumber of jobsJobsunits1500394.00
820102016A000011124CanadaTotal non-profit institutionsFemale employeesHours workedHoursthousands2331018.00
920102016A000011124CanadaTotal non-profit institutionsFemale employeesWages and salariesDollarsmillions60943.00
REF_DATEDGUIDGEOSectorCharacteristicsIndicatorsUOMSCALAR_FACTORVALUE
10583020212016A000262NunavutGovernment non-profit institutions55 to 64 yearsAverage weekly hours workedHoursunits33.00
10583120212016A000262NunavutGovernment non-profit institutions55 to 64 yearsAverage annual wages and salariesDollarsunits101380.00
10583220212016A000262NunavutGovernment non-profit institutions55 to 64 yearsAverage hourly wageDollarsunits59.98
10583320212016A000262NunavutGovernment non-profit institutions65 years old and overNumber of jobsJobsunits27.00
10583420212016A000262NunavutGovernment non-profit institutions65 years old and overHours workedHoursthousands30.00
10583520212016A000262NunavutGovernment non-profit institutions65 years old and overWages and salariesDollarsmillions2.00
10583620212016A000262NunavutGovernment non-profit institutions65 years old and overAverage annual hours workedHoursunits1111.00
10583720212016A000262NunavutGovernment non-profit institutions65 years old and overAverage weekly hours workedHoursunits21.00
10583820212016A000262NunavutGovernment non-profit institutions65 years old and overAverage annual wages and salariesDollarsunits74037.00
10583920212016A000262NunavutGovernment non-profit institutions65 years old and overAverage hourly wageDollarsunits66.63
Pandas Profiling Report with Columns Sorted

Overview

Dataset statistics

Number of variables9
Number of observations105840
Missing cells3024
Missing cells (%)0.3%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory7.3 MiB
Average record size in memory72.0 B

Variable types

Numeric2
Categorical7

Alerts

DGUID is highly overall correlated with GEOHigh correlation
GEO is highly overall correlated with DGUIDHigh correlation
Indicators is highly overall correlated with UOM and 1 other fieldsHigh correlation
UOM is highly overall correlated with IndicatorsHigh correlation
SCALAR_FACTOR is highly overall correlated with IndicatorsHigh correlation
VALUE has 3024 (2.9%) missing valuesMissing
DGUID is uniformly distributedUniform
GEO is uniformly distributedUniform
Sector is uniformly distributedUniform
Characteristics is uniformly distributedUniform
Indicators is uniformly distributedUniform

Reproduction

Analysis started2023-12-20 15:50:55.309213
Analysis finished2023-12-20 15:51:01.491981
Duration6.18 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

REF_DATE
Real number (ℝ)

Distinct12
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2015.5
Minimum2010
Maximum2021
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size827.0 KiB
2023-12-20T10:51:01.674062image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum2010
5-th percentile2010
Q12012.75
median2015.5
Q32018.25
95-th percentile2021
Maximum2021
Range11
Interquartile range (IQR)5.5

Descriptive statistics

Standard deviation3.4520688
Coefficient of variation (CV)0.0017127605
Kurtosis-1.216784
Mean2015.5
Median Absolute Deviation (MAD)3
Skewness0
Sum2.1332052 × 108
Variance11.916779
MonotonicityIncreasing
2023-12-20T10:51:01.985817image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=12)
ValueCountFrequency (%)
2010 8820
8.3%
2011 8820
8.3%
2012 8820
8.3%
2013 8820
8.3%
2014 8820
8.3%
2015 8820
8.3%
2016 8820
8.3%
2017 8820
8.3%
2018 8820
8.3%
2019 8820
8.3%
Other values (2) 17640
16.7%
ValueCountFrequency (%)
2010 8820
8.3%
2011 8820
8.3%
2012 8820
8.3%
2013 8820
8.3%
2014 8820
8.3%
2015 8820
8.3%
2016 8820
8.3%
2017 8820
8.3%
2018 8820
8.3%
2019 8820
8.3%
ValueCountFrequency (%)
2021 8820
8.3%
2020 8820
8.3%
2019 8820
8.3%
2018 8820
8.3%
2017 8820
8.3%
2016 8820
8.3%
2015 8820
8.3%
2014 8820
8.3%
2013 8820
8.3%
2012 8820
8.3%

DGUID
Categorical

HIGH CORRELATION  UNIFORM 

Distinct14
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size827.0 KiB
2016A000011124
7560 
2016A000210
7560 
2016A000211
7560 
2016A000212
7560 
2016A000213
7560 
Other values (9)
68040 

Length

Max length14
Median length11
Mean length11.214286
Min length11

Characters and Unicode

Total characters1186920
Distinct characters11
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2016A000011124
2nd row2016A000011124
3rd row2016A000011124
4th row2016A000011124
5th row2016A000011124

Common Values

ValueCountFrequency (%)
2016A000011124 7560
 
7.1%
2016A000210 7560
 
7.1%
2016A000211 7560
 
7.1%
2016A000212 7560
 
7.1%
2016A000213 7560
 
7.1%
2016A000224 7560
 
7.1%
2016A000235 7560
 
7.1%
2016A000246 7560
 
7.1%
2016A000247 7560
 
7.1%
2016A000248 7560
 
7.1%
Other values (4) 30240
28.6%

Length

2023-12-20T10:51:02.340651image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
2016a000011124 7560
 
7.1%
2016a000210 7560
 
7.1%
2016a000211 7560
 
7.1%
2016a000212 7560
 
7.1%
2016a000213 7560
 
7.1%
2016a000224 7560
 
7.1%
2016a000235 7560
 
7.1%
2016a000246 7560
 
7.1%
2016a000247 7560
 
7.1%
2016a000248 7560
 
7.1%
Other values (4) 30240
28.6%

Most occurring characters

ValueCountFrequency (%)
0 446040
37.6%
2 234360
19.7%
1 173880
 
14.6%
6 136080
 
11.5%
A 105840
 
8.9%
4 37800
 
3.2%
3 15120
 
1.3%
5 15120
 
1.3%
7 7560
 
0.6%
8 7560
 
0.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 1081080
91.1%
Uppercase Letter 105840
 
8.9%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 446040
41.3%
2 234360
21.7%
1 173880
 
16.1%
6 136080
 
12.6%
4 37800
 
3.5%
3 15120
 
1.4%
5 15120
 
1.4%
7 7560
 
0.7%
8 7560
 
0.7%
9 7560
 
0.7%
Uppercase Letter
ValueCountFrequency (%)
A 105840
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 1081080
91.1%
Latin 105840
 
8.9%

Most frequent character per script

Common
ValueCountFrequency (%)
0 446040
41.3%
2 234360
21.7%
1 173880
 
16.1%
6 136080
 
12.6%
4 37800
 
3.5%
3 15120
 
1.4%
5 15120
 
1.4%
7 7560
 
0.7%
8 7560
 
0.7%
9 7560
 
0.7%
Latin
ValueCountFrequency (%)
A 105840
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1186920
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 446040
37.6%
2 234360
19.7%
1 173880
 
14.6%
6 136080
 
11.5%
A 105840
 
8.9%
4 37800
 
3.2%
3 15120
 
1.3%
5 15120
 
1.3%
7 7560
 
0.6%
8 7560
 
0.6%

GEO
Categorical

HIGH CORRELATION  UNIFORM 

Distinct14
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size827.0 KiB
Canada
7560 
Newfoundland and Labrador
7560 
Prince Edward Island
7560 
Nova Scotia
7560 
New Brunswick
7560 
Other values (9)
68040 

Length

Max length25
Median length14.5
Mean length11.714286
Min length5

Characters and Unicode

Total characters1239840
Distinct characters34
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowCanada
2nd rowCanada
3rd rowCanada
4th rowCanada
5th rowCanada

Common Values

ValueCountFrequency (%)
Canada 7560
 
7.1%
Newfoundland and Labrador 7560
 
7.1%
Prince Edward Island 7560
 
7.1%
Nova Scotia 7560
 
7.1%
New Brunswick 7560
 
7.1%
Quebec 7560
 
7.1%
Ontario 7560
 
7.1%
Manitoba 7560
 
7.1%
Saskatchewan 7560
 
7.1%
Alberta 7560
 
7.1%
Other values (4) 30240
28.6%

Length

2023-12-20T10:51:02.783202image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
canada 7560
 
4.5%
newfoundland 7560
 
4.5%
territories 7560
 
4.5%
northwest 7560
 
4.5%
yukon 7560
 
4.5%
columbia 7560
 
4.5%
british 7560
 
4.5%
alberta 7560
 
4.5%
saskatchewan 7560
 
4.5%
manitoba 7560
 
4.5%
Other values (12) 90720
54.5%

Most occurring characters

ValueCountFrequency (%)
a 151200
 
12.2%
n 90720
 
7.3%
r 90720
 
7.3%
t 75600
 
6.1%
e 75600
 
6.1%
o 75600
 
6.1%
i 75600
 
6.1%
d 60480
 
4.9%
60480
 
4.9%
u 52920
 
4.3%
Other values (24) 430920
34.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1020600
82.3%
Uppercase Letter 158760
 
12.8%
Space Separator 60480
 
4.9%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 151200
14.8%
n 90720
 
8.9%
r 90720
 
8.9%
t 75600
 
7.4%
e 75600
 
7.4%
o 75600
 
7.4%
i 75600
 
7.4%
d 60480
 
5.9%
u 52920
 
5.2%
s 45360
 
4.4%
Other values (9) 226800
22.2%
Uppercase Letter
ValueCountFrequency (%)
N 37800
23.8%
B 15120
 
9.5%
S 15120
 
9.5%
C 15120
 
9.5%
I 7560
 
4.8%
E 7560
 
4.8%
P 7560
 
4.8%
L 7560
 
4.8%
Q 7560
 
4.8%
O 7560
 
4.8%
Other values (4) 30240
19.0%
Space Separator
ValueCountFrequency (%)
60480
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 1179360
95.1%
Common 60480
 
4.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 151200
 
12.8%
n 90720
 
7.7%
r 90720
 
7.7%
t 75600
 
6.4%
e 75600
 
6.4%
o 75600
 
6.4%
i 75600
 
6.4%
d 60480
 
5.1%
u 52920
 
4.5%
s 45360
 
3.8%
Other values (23) 385560
32.7%
Common
ValueCountFrequency (%)
60480
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1239840
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a 151200
 
12.2%
n 90720
 
7.3%
r 90720
 
7.3%
t 75600
 
6.1%
e 75600
 
6.1%
o 75600
 
6.1%
i 75600
 
6.1%
d 60480
 
4.9%
60480
 
4.9%
u 52920
 
4.3%
Other values (24) 430920
34.8%

Sector
Categorical

UNIFORM 

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size827.0 KiB
Total non-profit institutions
21168 
Total non-profit institutions excluding governments
21168 
Non-profit institutions serving households (community organizations)
21168 
Business non-profit institutions
21168 
Government non-profit institutions
21168 

Length

Max length68
Median length34
Mean length42.8
Min length29

Characters and Unicode

Total characters4529952
Distinct characters29
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowTotal non-profit institutions
2nd rowTotal non-profit institutions
3rd rowTotal non-profit institutions
4th rowTotal non-profit institutions
5th rowTotal non-profit institutions

Common Values

ValueCountFrequency (%)
Total non-profit institutions 21168
20.0%
Total non-profit institutions excluding governments 21168
20.0%
Non-profit institutions serving households (community organizations) 21168
20.0%
Business non-profit institutions 21168
20.0%
Government non-profit institutions 21168
20.0%

Length

2023-12-20T10:51:03.536462image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-20T10:51:04.065834image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
non-profit 105840
25.0%
institutions 105840
25.0%
total 42336
 
10.0%
excluding 21168
 
5.0%
governments 21168
 
5.0%
serving 21168
 
5.0%
households 21168
 
5.0%
community 21168
 
5.0%
organizations 21168
 
5.0%
business 21168
 
5.0%

Most occurring characters

ValueCountFrequency (%)
n 613872
13.6%
i 550368
12.1%
t 550368
12.1%
o 508032
11.2%
s 381024
 
8.4%
317520
 
7.0%
r 190512
 
4.2%
u 190512
 
4.2%
e 169344
 
3.7%
f 105840
 
2.3%
Other values (19) 952560
21.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 3958416
87.4%
Space Separator 317520
 
7.0%
Dash Punctuation 105840
 
2.3%
Uppercase Letter 105840
 
2.3%
Open Punctuation 21168
 
0.5%
Close Punctuation 21168
 
0.5%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
n 613872
15.5%
i 550368
13.9%
t 550368
13.9%
o 508032
12.8%
s 381024
9.6%
r 190512
 
4.8%
u 190512
 
4.8%
e 169344
 
4.3%
f 105840
 
2.7%
p 105840
 
2.7%
Other values (11) 592704
15.0%
Uppercase Letter
ValueCountFrequency (%)
T 42336
40.0%
N 21168
20.0%
B 21168
20.0%
G 21168
20.0%
Space Separator
ValueCountFrequency (%)
317520
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 105840
100.0%
Open Punctuation
ValueCountFrequency (%)
( 21168
100.0%
Close Punctuation
ValueCountFrequency (%)
) 21168
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 4064256
89.7%
Common 465696
 
10.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
n 613872
15.1%
i 550368
13.5%
t 550368
13.5%
o 508032
12.5%
s 381024
9.4%
r 190512
 
4.7%
u 190512
 
4.7%
e 169344
 
4.2%
f 105840
 
2.6%
p 105840
 
2.6%
Other values (15) 698544
17.2%
Common
ValueCountFrequency (%)
317520
68.2%
- 105840
 
22.7%
( 21168
 
4.5%
) 21168
 
4.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 4529952
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
n 613872
13.6%
i 550368
12.1%
t 550368
12.1%
o 508032
11.2%
s 381024
 
8.4%
317520
 
7.0%
r 190512
 
4.2%
u 190512
 
4.2%
e 169344
 
3.7%
f 105840
 
2.3%
Other values (19) 952560
21.0%

Characteristics
Categorical

UNIFORM 

Distinct18
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size827.0 KiB
Male employees
 
5880
Female employees
 
5880
Immigrant employees
 
5880
Non-immigrant employees
 
5880
Indigenous identity employees
 
5880
Other values (13)
76440 

Length

Max length33
Median length28
Mean length19.5
Min length14

Characters and Unicode

Total characters2063880
Distinct characters37
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowMale employees
2nd rowMale employees
3rd rowMale employees
4th rowMale employees
5th rowMale employees

Common Values

ValueCountFrequency (%)
Male employees 5880
 
5.6%
Female employees 5880
 
5.6%
Immigrant employees 5880
 
5.6%
Non-immigrant employees 5880
 
5.6%
Indigenous identity employees 5880
 
5.6%
Non-indigenous identity employees 5880
 
5.6%
Visible minority 5880
 
5.6%
Not a visible minority 5880
 
5.6%
High school diploma and less 5880
 
5.6%
Trade certificate 5880
 
5.6%
Other values (8) 47040
44.4%

Length

2023-12-20T10:51:04.719414image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
employees 35280
 
10.3%
years 35280
 
10.3%
to 29400
 
8.6%
and 17640
 
5.2%
identity 11760
 
3.4%
visible 11760
 
3.4%
minority 11760
 
3.4%
diploma 11760
 
3.4%
male 5880
 
1.7%
34 5880
 
1.7%
Other values (28) 164640
48.3%

Most occurring characters

ValueCountFrequency (%)
e 264600
12.8%
235200
 
11.4%
i 152880
 
7.4%
o 147000
 
7.1%
s 117600
 
5.7%
a 105840
 
5.1%
l 99960
 
4.8%
y 99960
 
4.8%
t 99960
 
4.8%
r 94080
 
4.6%
Other values (27) 646800
31.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1617000
78.3%
Space Separator 235200
 
11.4%
Decimal Number 129360
 
6.3%
Uppercase Letter 70560
 
3.4%
Dash Punctuation 11760
 
0.6%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 264600
16.4%
i 152880
9.5%
o 147000
9.1%
s 117600
 
7.3%
a 105840
 
6.5%
l 99960
 
6.2%
y 99960
 
6.2%
t 99960
 
6.2%
r 94080
 
5.8%
n 94080
 
5.8%
Other values (10) 341040
21.1%
Uppercase Letter
ValueCountFrequency (%)
N 17640
25.0%
I 11760
16.7%
H 5880
 
8.3%
T 5880
 
8.3%
C 5880
 
8.3%
U 5880
 
8.3%
V 5880
 
8.3%
F 5880
 
8.3%
M 5880
 
8.3%
Decimal Number
ValueCountFrequency (%)
5 47040
36.4%
4 41160
31.8%
2 11760
 
9.1%
3 11760
 
9.1%
6 11760
 
9.1%
1 5880
 
4.5%
Space Separator
ValueCountFrequency (%)
235200
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 11760
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 1687560
81.8%
Common 376320
 
18.2%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 264600
15.7%
i 152880
 
9.1%
o 147000
 
8.7%
s 117600
 
7.0%
a 105840
 
6.3%
l 99960
 
5.9%
y 99960
 
5.9%
t 99960
 
5.9%
r 94080
 
5.6%
n 94080
 
5.6%
Other values (19) 411600
24.4%
Common
ValueCountFrequency (%)
235200
62.5%
5 47040
 
12.5%
4 41160
 
10.9%
2 11760
 
3.1%
3 11760
 
3.1%
- 11760
 
3.1%
6 11760
 
3.1%
1 5880
 
1.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2063880
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 264600
12.8%
235200
 
11.4%
i 152880
 
7.4%
o 147000
 
7.1%
s 117600
 
5.7%
a 105840
 
5.1%
l 99960
 
4.8%
y 99960
 
4.8%
t 99960
 
4.8%
r 94080
 
4.6%
Other values (27) 646800
31.3%

Indicators
Categorical

HIGH CORRELATION  UNIFORM 

Distinct7
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size827.0 KiB
Number of jobs
15120 
Hours worked
15120 
Wages and salaries
15120 
Average annual hours worked
15120 
Average weekly hours worked
15120 
Other values (2)
30240 

Length

Max length33
Median length19
Mean length21.428571
Min length12

Characters and Unicode

Total characters2268000
Distinct characters25
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNumber of jobs
2nd rowHours worked
3rd rowWages and salaries
4th rowAverage annual hours worked
5th rowAverage weekly hours worked

Common Values

ValueCountFrequency (%)
Number of jobs 15120
14.3%
Hours worked 15120
14.3%
Wages and salaries 15120
14.3%
Average annual hours worked 15120
14.3%
Average weekly hours worked 15120
14.3%
Average annual wages and salaries 15120
14.3%
Average hourly wage 15120
14.3%

Length

2023-12-20T10:51:05.264625image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-20T10:51:05.756463image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
average 60480
16.7%
hours 45360
12.5%
worked 45360
12.5%
wages 30240
8.3%
and 30240
8.3%
salaries 30240
8.3%
annual 30240
8.3%
number 15120
 
4.2%
of 15120
 
4.2%
jobs 15120
 
4.2%
Other values (3) 45360
12.5%

Most occurring characters

ValueCountFrequency (%)
e 287280
12.7%
a 257040
11.3%
257040
11.3%
r 211680
 
9.3%
s 151200
 
6.7%
o 136080
 
6.0%
u 105840
 
4.7%
g 105840
 
4.7%
n 90720
 
4.0%
l 90720
 
4.0%
Other values (15) 574560
25.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1905120
84.0%
Space Separator 257040
 
11.3%
Uppercase Letter 105840
 
4.7%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 287280
15.1%
a 257040
13.5%
r 211680
11.1%
s 151200
 
7.9%
o 136080
 
7.1%
u 105840
 
5.6%
g 105840
 
5.6%
n 90720
 
4.8%
l 90720
 
4.8%
w 90720
 
4.8%
Other values (10) 378000
19.8%
Uppercase Letter
ValueCountFrequency (%)
A 60480
57.1%
W 15120
 
14.3%
H 15120
 
14.3%
N 15120
 
14.3%
Space Separator
ValueCountFrequency (%)
257040
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 2010960
88.7%
Common 257040
 
11.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 287280
14.3%
a 257040
12.8%
r 211680
10.5%
s 151200
 
7.5%
o 136080
 
6.8%
u 105840
 
5.3%
g 105840
 
5.3%
n 90720
 
4.5%
l 90720
 
4.5%
w 90720
 
4.5%
Other values (14) 483840
24.1%
Common
ValueCountFrequency (%)
257040
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2268000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 287280
12.7%
a 257040
11.3%
257040
11.3%
r 211680
 
9.3%
s 151200
 
6.7%
o 136080
 
6.0%
u 105840
 
4.7%
g 105840
 
4.7%
n 90720
 
4.0%
l 90720
 
4.0%
Other values (15) 574560
25.3%

UOM
Categorical

HIGH CORRELATION 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size827.0 KiB
Hours
45360 
Dollars
45360 
Jobs
15120 

Length

Max length7
Median length5
Mean length5.7142857
Min length4

Characters and Unicode

Total characters604800
Distinct characters10
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowJobs
2nd rowHours
3rd rowDollars
4th rowHours
5th rowHours

Common Values

ValueCountFrequency (%)
Hours 45360
42.9%
Dollars 45360
42.9%
Jobs 15120
 
14.3%

Length

2023-12-20T10:51:06.355368image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-20T10:51:06.831299image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
hours 45360
42.9%
dollars 45360
42.9%
jobs 15120
 
14.3%

Most occurring characters

ValueCountFrequency (%)
o 105840
17.5%
s 105840
17.5%
r 90720
15.0%
l 90720
15.0%
H 45360
7.5%
u 45360
7.5%
D 45360
7.5%
a 45360
7.5%
J 15120
 
2.5%
b 15120
 
2.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 498960
82.5%
Uppercase Letter 105840
 
17.5%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
o 105840
21.2%
s 105840
21.2%
r 90720
18.2%
l 90720
18.2%
u 45360
9.1%
a 45360
9.1%
b 15120
 
3.0%
Uppercase Letter
ValueCountFrequency (%)
H 45360
42.9%
D 45360
42.9%
J 15120
 
14.3%

Most occurring scripts

ValueCountFrequency (%)
Latin 604800
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
o 105840
17.5%
s 105840
17.5%
r 90720
15.0%
l 90720
15.0%
H 45360
7.5%
u 45360
7.5%
D 45360
7.5%
a 45360
7.5%
J 15120
 
2.5%
b 15120
 
2.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 604800
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
o 105840
17.5%
s 105840
17.5%
r 90720
15.0%
l 90720
15.0%
H 45360
7.5%
u 45360
7.5%
D 45360
7.5%
a 45360
7.5%
J 15120
 
2.5%
b 15120
 
2.5%

SCALAR_FACTOR
Categorical

HIGH CORRELATION 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size827.0 KiB
units
75600 
thousands
15120 
millions
15120 

Length

Max length9
Median length5
Mean length6
Min length5

Characters and Unicode

Total characters635040
Distinct characters11
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowunits
2nd rowthousands
3rd rowmillions
4th rowunits
5th rowunits

Common Values

ValueCountFrequency (%)
units 75600
71.4%
thousands 15120
 
14.3%
millions 15120
 
14.3%

Length

2023-12-20T10:51:07.414856image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-20T10:51:07.908617image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
units 75600
71.4%
thousands 15120
 
14.3%
millions 15120
 
14.3%

Most occurring characters

ValueCountFrequency (%)
s 120960
19.0%
n 105840
16.7%
i 105840
16.7%
u 90720
14.3%
t 90720
14.3%
o 30240
 
4.8%
l 30240
 
4.8%
h 15120
 
2.4%
a 15120
 
2.4%
d 15120
 
2.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 635040
100.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
s 120960
19.0%
n 105840
16.7%
i 105840
16.7%
u 90720
14.3%
t 90720
14.3%
o 30240
 
4.8%
l 30240
 
4.8%
h 15120
 
2.4%
a 15120
 
2.4%
d 15120
 
2.4%

Most occurring scripts

ValueCountFrequency (%)
Latin 635040
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
s 120960
19.0%
n 105840
16.7%
i 105840
16.7%
u 90720
14.3%
t 90720
14.3%
o 30240
 
4.8%
l 30240
 
4.8%
h 15120
 
2.4%
a 15120
 
2.4%
d 15120
 
2.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 635040
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
s 120960
19.0%
n 105840
16.7%
i 105840
16.7%
u 90720
14.3%
t 90720
14.3%
o 30240
 
4.8%
l 30240
 
4.8%
h 15120
 
2.4%
a 15120
 
2.4%
d 15120
 
2.4%

VALUE
Real number (ℝ)

MISSING 

Distinct35398
Distinct (%)34.4%
Missing3024
Missing (%)2.9%
Infinite0
Infinite (%)0.0%
Mean26419.543
Minimum0
Maximum3857813
Zeros9
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size827.0 KiB
2023-12-20T10:51:08.456905image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile18.6075
Q132
median1386
Q317660
95-th percentile85576.5
Maximum3857813
Range3857813
Interquartile range (IQR)17628

Descriptive statistics

Standard deviation118041.35
Coefficient of variation (CV)4.4679558
Kurtosis266.93641
Mean26419.543
Median Absolute Deviation (MAD)1358
Skewness13.831207
Sum2.7163518 × 109
Variance1.393376 × 1010
MonotonicityNot monotonic
2023-12-20T10:51:09.140817image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
30 2040
 
1.9%
31 1949
 
1.8%
32 1844
 
1.7%
29 1562
 
1.5%
33 1476
 
1.4%
34 1005
 
0.9%
28 905
 
0.9%
35 638
 
0.6%
27 523
 
0.5%
36 457
 
0.4%
Other values (35388) 90417
85.4%
(Missing) 3024
 
2.9%
ValueCountFrequency (%)
0 9
 
< 0.1%
1 219
0.2%
2 299
0.3%
3 201
0.2%
4 203
0.2%
5 157
0.1%
6 147
0.1%
7 135
0.1%
8 179
0.2%
9 145
0.1%
ValueCountFrequency (%)
3857813 1
< 0.1%
3690238 1
< 0.1%
3643952 1
< 0.1%
3581317 1
< 0.1%
3556730 1
< 0.1%
3544822 1
< 0.1%
3487265 1
< 0.1%
3425218 1
< 0.1%
3402343 1
< 0.1%
3372266 1
< 0.1%

Interactions

2023-12-20T10:50:59.872996image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-20T10:50:59.069043image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-20T10:51:00.215569image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-20T10:50:59.561147image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-20T10:51:09.614716image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
REF_DATEVALUEDGUIDGEOSectorCharacteristicsIndicatorsUOMSCALAR_FACTOR
REF_DATE1.0000.0270.0000.0000.0000.0000.0000.0000.000
VALUE0.0271.0000.0900.0900.0510.0440.0750.0810.105
DGUID0.0000.0901.0001.0000.0000.0000.0000.0000.000
GEO0.0000.0901.0001.0000.0000.0000.0000.0000.000
Sector0.0000.0510.0000.0001.0000.0000.0000.0000.000
Characteristics0.0000.0440.0000.0000.0001.0000.0000.0000.000
Indicators0.0000.0750.0000.0000.0000.0001.0001.0001.000
UOM0.0000.0810.0000.0000.0000.0001.0001.0000.447
SCALAR_FACTOR0.0000.1050.0000.0000.0000.0001.0000.4471.000

Missing values

2023-12-20T10:51:00.669350image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-20T10:51:01.129264image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

REF_DATEDGUIDGEOSectorCharacteristicsIndicatorsUOMSCALAR_FACTORVALUE
020102016A000011124CanadaTotal non-profit institutionsMale employeesNumber of jobsJobsunits642584.00
120102016A000011124CanadaTotal non-profit institutionsMale employeesHours workedHoursthousands1048516.00
220102016A000011124CanadaTotal non-profit institutionsMale employeesWages and salariesDollarsmillions30805.00
320102016A000011124CanadaTotal non-profit institutionsMale employeesAverage annual hours workedHoursunits1632.00
420102016A000011124CanadaTotal non-profit institutionsMale employeesAverage weekly hours workedHoursunits31.00
520102016A000011124CanadaTotal non-profit institutionsMale employeesAverage annual wages and salariesDollarsunits47940.00
620102016A000011124CanadaTotal non-profit institutionsMale employeesAverage hourly wageDollarsunits29.38
720102016A000011124CanadaTotal non-profit institutionsFemale employeesNumber of jobsJobsunits1500394.00
820102016A000011124CanadaTotal non-profit institutionsFemale employeesHours workedHoursthousands2331018.00
920102016A000011124CanadaTotal non-profit institutionsFemale employeesWages and salariesDollarsmillions60943.00
REF_DATEDGUIDGEOSectorCharacteristicsIndicatorsUOMSCALAR_FACTORVALUE
10583020212016A000262NunavutGovernment non-profit institutions55 to 64 yearsAverage weekly hours workedHoursunits33.00
10583120212016A000262NunavutGovernment non-profit institutions55 to 64 yearsAverage annual wages and salariesDollarsunits101380.00
10583220212016A000262NunavutGovernment non-profit institutions55 to 64 yearsAverage hourly wageDollarsunits59.98
10583320212016A000262NunavutGovernment non-profit institutions65 years old and overNumber of jobsJobsunits27.00
10583420212016A000262NunavutGovernment non-profit institutions65 years old and overHours workedHoursthousands30.00
10583520212016A000262NunavutGovernment non-profit institutions65 years old and overWages and salariesDollarsmillions2.00
10583620212016A000262NunavutGovernment non-profit institutions65 years old and overAverage annual hours workedHoursunits1111.00
10583720212016A000262NunavutGovernment non-profit institutions65 years old and overAverage weekly hours workedHoursunits21.00
10583820212016A000262NunavutGovernment non-profit institutions65 years old and overAverage annual wages and salariesDollarsunits74037.00
10583920212016A000262NunavutGovernment non-profit institutions65 years old and overAverage hourly wageDollarsunits66.63
Pandas Profiling Report with Columns Sorted

Overview

Dataset statistics

Number of variables9
Number of observations105840
Missing cells3024
Missing cells (%)0.3%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory7.3 MiB
Average record size in memory72.0 B

Variable types

Numeric2
Categorical7

Alerts

DGUID is highly overall correlated with GEOHigh correlation
GEO is highly overall correlated with DGUIDHigh correlation
Indicators is highly overall correlated with UOM and 1 other fieldsHigh correlation
UOM is highly overall correlated with IndicatorsHigh correlation
SCALAR_FACTOR is highly overall correlated with IndicatorsHigh correlation
VALUE has 3024 (2.9%) missing valuesMissing
DGUID is uniformly distributedUniform
GEO is uniformly distributedUniform
Sector is uniformly distributedUniform
Characteristics is uniformly distributedUniform
Indicators is uniformly distributedUniform

Reproduction

Analysis started2023-12-20 17:52:16.657464
Analysis finished2023-12-20 17:52:22.584935
Duration5.93 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

REF_DATE
Real number (ℝ)

Distinct12
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2015.5
Minimum2010
Maximum2021
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size827.0 KiB
2023-12-20T12:52:22.714948image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum2010
5-th percentile2010
Q12012.75
median2015.5
Q32018.25
95-th percentile2021
Maximum2021
Range11
Interquartile range (IQR)5.5

Descriptive statistics

Standard deviation3.4520688
Coefficient of variation (CV)0.0017127605
Kurtosis-1.216784
Mean2015.5
Median Absolute Deviation (MAD)3
Skewness0
Sum2.1332052 × 108
Variance11.916779
MonotonicityIncreasing
2023-12-20T12:52:22.979285image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=12)
ValueCountFrequency (%)
2010 8820
8.3%
2011 8820
8.3%
2012 8820
8.3%
2013 8820
8.3%
2014 8820
8.3%
2015 8820
8.3%
2016 8820
8.3%
2017 8820
8.3%
2018 8820
8.3%
2019 8820
8.3%
Other values (2) 17640
16.7%
ValueCountFrequency (%)
2010 8820
8.3%
2011 8820
8.3%
2012 8820
8.3%
2013 8820
8.3%
2014 8820
8.3%
2015 8820
8.3%
2016 8820
8.3%
2017 8820
8.3%
2018 8820
8.3%
2019 8820
8.3%
ValueCountFrequency (%)
2021 8820
8.3%
2020 8820
8.3%
2019 8820
8.3%
2018 8820
8.3%
2017 8820
8.3%
2016 8820
8.3%
2015 8820
8.3%
2014 8820
8.3%
2013 8820
8.3%
2012 8820
8.3%

DGUID
Categorical

HIGH CORRELATION  UNIFORM 

Distinct14
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size827.0 KiB
2016A000011124
7560 
2016A000210
7560 
2016A000211
7560 
2016A000212
7560 
2016A000213
7560 
Other values (9)
68040 

Length

Max length14
Median length11
Mean length11.214286
Min length11

Characters and Unicode

Total characters1186920
Distinct characters11
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2016A000011124
2nd row2016A000011124
3rd row2016A000011124
4th row2016A000011124
5th row2016A000011124

Common Values

ValueCountFrequency (%)
2016A000011124 7560
 
7.1%
2016A000210 7560
 
7.1%
2016A000211 7560
 
7.1%
2016A000212 7560
 
7.1%
2016A000213 7560
 
7.1%
2016A000224 7560
 
7.1%
2016A000235 7560
 
7.1%
2016A000246 7560
 
7.1%
2016A000247 7560
 
7.1%
2016A000248 7560
 
7.1%
Other values (4) 30240
28.6%

Length

2023-12-20T12:52:23.259664image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
2016a000011124 7560
 
7.1%
2016a000210 7560
 
7.1%
2016a000211 7560
 
7.1%
2016a000212 7560
 
7.1%
2016a000213 7560
 
7.1%
2016a000224 7560
 
7.1%
2016a000235 7560
 
7.1%
2016a000246 7560
 
7.1%
2016a000247 7560
 
7.1%
2016a000248 7560
 
7.1%
Other values (4) 30240
28.6%

Most occurring characters

ValueCountFrequency (%)
0 446040
37.6%
2 234360
19.7%
1 173880
 
14.6%
6 136080
 
11.5%
A 105840
 
8.9%
4 37800
 
3.2%
3 15120
 
1.3%
5 15120
 
1.3%
7 7560
 
0.6%
8 7560
 
0.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 1081080
91.1%
Uppercase Letter 105840
 
8.9%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 446040
41.3%
2 234360
21.7%
1 173880
 
16.1%
6 136080
 
12.6%
4 37800
 
3.5%
3 15120
 
1.4%
5 15120
 
1.4%
7 7560
 
0.7%
8 7560
 
0.7%
9 7560
 
0.7%
Uppercase Letter
ValueCountFrequency (%)
A 105840
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 1081080
91.1%
Latin 105840
 
8.9%

Most frequent character per script

Common
ValueCountFrequency (%)
0 446040
41.3%
2 234360
21.7%
1 173880
 
16.1%
6 136080
 
12.6%
4 37800
 
3.5%
3 15120
 
1.4%
5 15120
 
1.4%
7 7560
 
0.7%
8 7560
 
0.7%
9 7560
 
0.7%
Latin
ValueCountFrequency (%)
A 105840
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1186920
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 446040
37.6%
2 234360
19.7%
1 173880
 
14.6%
6 136080
 
11.5%
A 105840
 
8.9%
4 37800
 
3.2%
3 15120
 
1.3%
5 15120
 
1.3%
7 7560
 
0.6%
8 7560
 
0.6%

GEO
Categorical

HIGH CORRELATION  UNIFORM 

Distinct14
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size827.0 KiB
Canada
7560 
Newfoundland and Labrador
7560 
Prince Edward Island
7560 
Nova Scotia
7560 
New Brunswick
7560 
Other values (9)
68040 

Length

Max length25
Median length14.5
Mean length11.714286
Min length5

Characters and Unicode

Total characters1239840
Distinct characters34
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowCanada
2nd rowCanada
3rd rowCanada
4th rowCanada
5th rowCanada

Common Values

ValueCountFrequency (%)
Canada 7560
 
7.1%
Newfoundland and Labrador 7560
 
7.1%
Prince Edward Island 7560
 
7.1%
Nova Scotia 7560
 
7.1%
New Brunswick 7560
 
7.1%
Quebec 7560
 
7.1%
Ontario 7560
 
7.1%
Manitoba 7560
 
7.1%
Saskatchewan 7560
 
7.1%
Alberta 7560
 
7.1%
Other values (4) 30240
28.6%

Length

2023-12-20T12:52:23.575882image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
canada 7560
 
4.5%
newfoundland 7560
 
4.5%
territories 7560
 
4.5%
northwest 7560
 
4.5%
yukon 7560
 
4.5%
columbia 7560
 
4.5%
british 7560
 
4.5%
alberta 7560
 
4.5%
saskatchewan 7560
 
4.5%
manitoba 7560
 
4.5%
Other values (12) 90720
54.5%

Most occurring characters

ValueCountFrequency (%)
a 151200
 
12.2%
n 90720
 
7.3%
r 90720
 
7.3%
t 75600
 
6.1%
e 75600
 
6.1%
o 75600
 
6.1%
i 75600
 
6.1%
d 60480
 
4.9%
60480
 
4.9%
u 52920
 
4.3%
Other values (24) 430920
34.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1020600
82.3%
Uppercase Letter 158760
 
12.8%
Space Separator 60480
 
4.9%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 151200
14.8%
n 90720
 
8.9%
r 90720
 
8.9%
t 75600
 
7.4%
e 75600
 
7.4%
o 75600
 
7.4%
i 75600
 
7.4%
d 60480
 
5.9%
u 52920
 
5.2%
s 45360
 
4.4%
Other values (9) 226800
22.2%
Uppercase Letter
ValueCountFrequency (%)
N 37800
23.8%
B 15120
 
9.5%
S 15120
 
9.5%
C 15120
 
9.5%
I 7560
 
4.8%
E 7560
 
4.8%
P 7560
 
4.8%
L 7560
 
4.8%
Q 7560
 
4.8%
O 7560
 
4.8%
Other values (4) 30240
19.0%
Space Separator
ValueCountFrequency (%)
60480
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 1179360
95.1%
Common 60480
 
4.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 151200
 
12.8%
n 90720
 
7.7%
r 90720
 
7.7%
t 75600
 
6.4%
e 75600
 
6.4%
o 75600
 
6.4%
i 75600
 
6.4%
d 60480
 
5.1%
u 52920
 
4.5%
s 45360
 
3.8%
Other values (23) 385560
32.7%
Common
ValueCountFrequency (%)
60480
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1239840
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a 151200
 
12.2%
n 90720
 
7.3%
r 90720
 
7.3%
t 75600
 
6.1%
e 75600
 
6.1%
o 75600
 
6.1%
i 75600
 
6.1%
d 60480
 
4.9%
60480
 
4.9%
u 52920
 
4.3%
Other values (24) 430920
34.8%

Sector
Categorical

UNIFORM 

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size827.0 KiB
Total non-profit institutions
21168 
Total non-profit institutions excluding governments
21168 
Non-profit institutions serving households (community organizations)
21168 
Business non-profit institutions
21168 
Government non-profit institutions
21168 

Length

Max length68
Median length34
Mean length42.8
Min length29

Characters and Unicode

Total characters4529952
Distinct characters29
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowTotal non-profit institutions
2nd rowTotal non-profit institutions
3rd rowTotal non-profit institutions
4th rowTotal non-profit institutions
5th rowTotal non-profit institutions

Common Values

ValueCountFrequency (%)
Total non-profit institutions 21168
20.0%
Total non-profit institutions excluding governments 21168
20.0%
Non-profit institutions serving households (community organizations) 21168
20.0%
Business non-profit institutions 21168
20.0%
Government non-profit institutions 21168
20.0%

Length

2023-12-20T12:52:23.844152image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-20T12:52:24.075925image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
non-profit 105840
25.0%
institutions 105840
25.0%
total 42336
 
10.0%
excluding 21168
 
5.0%
governments 21168
 
5.0%
serving 21168
 
5.0%
households 21168
 
5.0%
community 21168
 
5.0%
organizations 21168
 
5.0%
business 21168
 
5.0%

Most occurring characters

ValueCountFrequency (%)
n 613872
13.6%
i 550368
12.1%
t 550368
12.1%
o 508032
11.2%
s 381024
 
8.4%
317520
 
7.0%
r 190512
 
4.2%
u 190512
 
4.2%
e 169344
 
3.7%
f 105840
 
2.3%
Other values (19) 952560
21.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 3958416
87.4%
Space Separator 317520
 
7.0%
Dash Punctuation 105840
 
2.3%
Uppercase Letter 105840
 
2.3%
Open Punctuation 21168
 
0.5%
Close Punctuation 21168
 
0.5%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
n 613872
15.5%
i 550368
13.9%
t 550368
13.9%
o 508032
12.8%
s 381024
9.6%
r 190512
 
4.8%
u 190512
 
4.8%
e 169344
 
4.3%
f 105840
 
2.7%
p 105840
 
2.7%
Other values (11) 592704
15.0%
Uppercase Letter
ValueCountFrequency (%)
T 42336
40.0%
N 21168
20.0%
B 21168
20.0%
G 21168
20.0%
Space Separator
ValueCountFrequency (%)
317520
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 105840
100.0%
Open Punctuation
ValueCountFrequency (%)
( 21168
100.0%
Close Punctuation
ValueCountFrequency (%)
) 21168
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 4064256
89.7%
Common 465696
 
10.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
n 613872
15.1%
i 550368
13.5%
t 550368
13.5%
o 508032
12.5%
s 381024
9.4%
r 190512
 
4.7%
u 190512
 
4.7%
e 169344
 
4.2%
f 105840
 
2.6%
p 105840
 
2.6%
Other values (15) 698544
17.2%
Common
ValueCountFrequency (%)
317520
68.2%
- 105840
 
22.7%
( 21168
 
4.5%
) 21168
 
4.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 4529952
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
n 613872
13.6%
i 550368
12.1%
t 550368
12.1%
o 508032
11.2%
s 381024
 
8.4%
317520
 
7.0%
r 190512
 
4.2%
u 190512
 
4.2%
e 169344
 
3.7%
f 105840
 
2.3%
Other values (19) 952560
21.0%

Characteristics
Categorical

UNIFORM 

Distinct18
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size827.0 KiB
Male employees
 
5880
Female employees
 
5880
Immigrant employees
 
5880
Non-immigrant employees
 
5880
Indigenous identity employees
 
5880
Other values (13)
76440 

Length

Max length33
Median length28
Mean length19.5
Min length14

Characters and Unicode

Total characters2063880
Distinct characters37
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowMale employees
2nd rowMale employees
3rd rowMale employees
4th rowMale employees
5th rowMale employees

Common Values

ValueCountFrequency (%)
Male employees 5880
 
5.6%
Female employees 5880
 
5.6%
Immigrant employees 5880
 
5.6%
Non-immigrant employees 5880
 
5.6%
Indigenous identity employees 5880
 
5.6%
Non-indigenous identity employees 5880
 
5.6%
Visible minority 5880
 
5.6%
Not a visible minority 5880
 
5.6%
High school diploma and less 5880
 
5.6%
Trade certificate 5880
 
5.6%
Other values (8) 47040
44.4%

Length

2023-12-20T12:52:24.337722image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
employees 35280
 
10.3%
years 35280
 
10.3%
to 29400
 
8.6%
and 17640
 
5.2%
identity 11760
 
3.4%
visible 11760
 
3.4%
minority 11760
 
3.4%
diploma 11760
 
3.4%
male 5880
 
1.7%
34 5880
 
1.7%
Other values (28) 164640
48.3%

Most occurring characters

ValueCountFrequency (%)
e 264600
12.8%
235200
 
11.4%
i 152880
 
7.4%
o 147000
 
7.1%
s 117600
 
5.7%
a 105840
 
5.1%
l 99960
 
4.8%
y 99960
 
4.8%
t 99960
 
4.8%
r 94080
 
4.6%
Other values (27) 646800
31.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1617000
78.3%
Space Separator 235200
 
11.4%
Decimal Number 129360
 
6.3%
Uppercase Letter 70560
 
3.4%
Dash Punctuation 11760
 
0.6%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 264600
16.4%
i 152880
9.5%
o 147000
9.1%
s 117600
 
7.3%
a 105840
 
6.5%
l 99960
 
6.2%
y 99960
 
6.2%
t 99960
 
6.2%
r 94080
 
5.8%
n 94080
 
5.8%
Other values (10) 341040
21.1%
Uppercase Letter
ValueCountFrequency (%)
N 17640
25.0%
I 11760
16.7%
H 5880
 
8.3%
T 5880
 
8.3%
C 5880
 
8.3%
U 5880
 
8.3%
V 5880
 
8.3%
F 5880
 
8.3%
M 5880
 
8.3%
Decimal Number
ValueCountFrequency (%)
5 47040
36.4%
4 41160
31.8%
2 11760
 
9.1%
3 11760
 
9.1%
6 11760
 
9.1%
1 5880
 
4.5%
Space Separator
ValueCountFrequency (%)
235200
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 11760
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 1687560
81.8%
Common 376320
 
18.2%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 264600
15.7%
i 152880
 
9.1%
o 147000
 
8.7%
s 117600
 
7.0%
a 105840
 
6.3%
l 99960
 
5.9%
y 99960
 
5.9%
t 99960
 
5.9%
r 94080
 
5.6%
n 94080
 
5.6%
Other values (19) 411600
24.4%
Common
ValueCountFrequency (%)
235200
62.5%
5 47040
 
12.5%
4 41160
 
10.9%
2 11760
 
3.1%
3 11760
 
3.1%
- 11760
 
3.1%
6 11760
 
3.1%
1 5880
 
1.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2063880
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 264600
12.8%
235200
 
11.4%
i 152880
 
7.4%
o 147000
 
7.1%
s 117600
 
5.7%
a 105840
 
5.1%
l 99960
 
4.8%
y 99960
 
4.8%
t 99960
 
4.8%
r 94080
 
4.6%
Other values (27) 646800
31.3%

Indicators
Categorical

HIGH CORRELATION  UNIFORM 

Distinct7
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size827.0 KiB
Number of jobs
15120 
Hours worked
15120 
Wages and salaries
15120 
Average annual hours worked
15120 
Average weekly hours worked
15120 
Other values (2)
30240 

Length

Max length33
Median length19
Mean length21.428571
Min length12

Characters and Unicode

Total characters2268000
Distinct characters25
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNumber of jobs
2nd rowHours worked
3rd rowWages and salaries
4th rowAverage annual hours worked
5th rowAverage weekly hours worked

Common Values

ValueCountFrequency (%)
Number of jobs 15120
14.3%
Hours worked 15120
14.3%
Wages and salaries 15120
14.3%
Average annual hours worked 15120
14.3%
Average weekly hours worked 15120
14.3%
Average annual wages and salaries 15120
14.3%
Average hourly wage 15120
14.3%

Length

2023-12-20T12:52:24.743199image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-20T12:52:24.964513image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
average 60480
16.7%
hours 45360
12.5%
worked 45360
12.5%
wages 30240
8.3%
and 30240
8.3%
salaries 30240
8.3%
annual 30240
8.3%
number 15120
 
4.2%
of 15120
 
4.2%
jobs 15120
 
4.2%
Other values (3) 45360
12.5%

Most occurring characters

ValueCountFrequency (%)
e 287280
12.7%
a 257040
11.3%
257040
11.3%
r 211680
 
9.3%
s 151200
 
6.7%
o 136080
 
6.0%
u 105840
 
4.7%
g 105840
 
4.7%
n 90720
 
4.0%
l 90720
 
4.0%
Other values (15) 574560
25.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1905120
84.0%
Space Separator 257040
 
11.3%
Uppercase Letter 105840
 
4.7%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 287280
15.1%
a 257040
13.5%
r 211680
11.1%
s 151200
 
7.9%
o 136080
 
7.1%
u 105840
 
5.6%
g 105840
 
5.6%
n 90720
 
4.8%
l 90720
 
4.8%
w 90720
 
4.8%
Other values (10) 378000
19.8%
Uppercase Letter
ValueCountFrequency (%)
A 60480
57.1%
W 15120
 
14.3%
H 15120
 
14.3%
N 15120
 
14.3%
Space Separator
ValueCountFrequency (%)
257040
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 2010960
88.7%
Common 257040
 
11.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 287280
14.3%
a 257040
12.8%
r 211680
10.5%
s 151200
 
7.5%
o 136080
 
6.8%
u 105840
 
5.3%
g 105840
 
5.3%
n 90720
 
4.5%
l 90720
 
4.5%
w 90720
 
4.5%
Other values (14) 483840
24.1%
Common
ValueCountFrequency (%)
257040
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2268000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 287280
12.7%
a 257040
11.3%
257040
11.3%
r 211680
 
9.3%
s 151200
 
6.7%
o 136080
 
6.0%
u 105840
 
4.7%
g 105840
 
4.7%
n 90720
 
4.0%
l 90720
 
4.0%
Other values (15) 574560
25.3%

UOM
Categorical

HIGH CORRELATION 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size827.0 KiB
Hours
45360 
Dollars
45360 
Jobs
15120 

Length

Max length7
Median length5
Mean length5.7142857
Min length4

Characters and Unicode

Total characters604800
Distinct characters10
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowJobs
2nd rowHours
3rd rowDollars
4th rowHours
5th rowHours

Common Values

ValueCountFrequency (%)
Hours 45360
42.9%
Dollars 45360
42.9%
Jobs 15120
 
14.3%

Length

2023-12-20T12:52:25.233313image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-20T12:52:25.412805image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
hours 45360
42.9%
dollars 45360
42.9%
jobs 15120
 
14.3%

Most occurring characters

ValueCountFrequency (%)
o 105840
17.5%
s 105840
17.5%
r 90720
15.0%
l 90720
15.0%
H 45360
7.5%
u 45360
7.5%
D 45360
7.5%
a 45360
7.5%
J 15120
 
2.5%
b 15120
 
2.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 498960
82.5%
Uppercase Letter 105840
 
17.5%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
o 105840
21.2%
s 105840
21.2%
r 90720
18.2%
l 90720
18.2%
u 45360
9.1%
a 45360
9.1%
b 15120
 
3.0%
Uppercase Letter
ValueCountFrequency (%)
H 45360
42.9%
D 45360
42.9%
J 15120
 
14.3%

Most occurring scripts

ValueCountFrequency (%)
Latin 604800
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
o 105840
17.5%
s 105840
17.5%
r 90720
15.0%
l 90720
15.0%
H 45360
7.5%
u 45360
7.5%
D 45360
7.5%
a 45360
7.5%
J 15120
 
2.5%
b 15120
 
2.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 604800
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
o 105840
17.5%
s 105840
17.5%
r 90720
15.0%
l 90720
15.0%
H 45360
7.5%
u 45360
7.5%
D 45360
7.5%
a 45360
7.5%
J 15120
 
2.5%
b 15120
 
2.5%

SCALAR_FACTOR
Categorical

HIGH CORRELATION 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size827.0 KiB
units
75600 
thousands
15120 
millions
15120 

Length

Max length9
Median length5
Mean length6
Min length5

Characters and Unicode

Total characters635040
Distinct characters11
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowunits
2nd rowthousands
3rd rowmillions
4th rowunits
5th rowunits

Common Values

ValueCountFrequency (%)
units 75600
71.4%
thousands 15120
 
14.3%
millions 15120
 
14.3%

Length

2023-12-20T12:52:25.679491image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-20T12:52:25.865071image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
units 75600
71.4%
thousands 15120
 
14.3%
millions 15120
 
14.3%

Most occurring characters

ValueCountFrequency (%)
s 120960
19.0%
n 105840
16.7%
i 105840
16.7%
u 90720
14.3%
t 90720
14.3%
o 30240
 
4.8%
l 30240
 
4.8%
h 15120
 
2.4%
a 15120
 
2.4%
d 15120
 
2.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 635040
100.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
s 120960
19.0%
n 105840
16.7%
i 105840
16.7%
u 90720
14.3%
t 90720
14.3%
o 30240
 
4.8%
l 30240
 
4.8%
h 15120
 
2.4%
a 15120
 
2.4%
d 15120
 
2.4%

Most occurring scripts

ValueCountFrequency (%)
Latin 635040
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
s 120960
19.0%
n 105840
16.7%
i 105840
16.7%
u 90720
14.3%
t 90720
14.3%
o 30240
 
4.8%
l 30240
 
4.8%
h 15120
 
2.4%
a 15120
 
2.4%
d 15120
 
2.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 635040
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
s 120960
19.0%
n 105840
16.7%
i 105840
16.7%
u 90720
14.3%
t 90720
14.3%
o 30240
 
4.8%
l 30240
 
4.8%
h 15120
 
2.4%
a 15120
 
2.4%
d 15120
 
2.4%

VALUE
Real number (ℝ)

MISSING 

Distinct35398
Distinct (%)34.4%
Missing3024
Missing (%)2.9%
Infinite0
Infinite (%)0.0%
Mean26419.543
Minimum0
Maximum3857813
Zeros9
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size827.0 KiB
2023-12-20T12:52:26.165905image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile18.6075
Q132
median1386
Q317660
95-th percentile85576.5
Maximum3857813
Range3857813
Interquartile range (IQR)17628

Descriptive statistics

Standard deviation118041.35
Coefficient of variation (CV)4.4679558
Kurtosis266.93641
Mean26419.543
Median Absolute Deviation (MAD)1358
Skewness13.831207
Sum2.7163518 × 109
Variance1.393376 × 1010
MonotonicityNot monotonic
2023-12-20T12:52:26.481546image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
30 2040
 
1.9%
31 1949
 
1.8%
32 1844
 
1.7%
29 1562
 
1.5%
33 1476
 
1.4%
34 1005
 
0.9%
28 905
 
0.9%
35 638
 
0.6%
27 523
 
0.5%
36 457
 
0.4%
Other values (35388) 90417
85.4%
(Missing) 3024
 
2.9%
ValueCountFrequency (%)
0 9
 
< 0.1%
1 219
0.2%
2 299
0.3%
3 201
0.2%
4 203
0.2%
5 157
0.1%
6 147
0.1%
7 135
0.1%
8 179
0.2%
9 145
0.1%
ValueCountFrequency (%)
3857813 1
< 0.1%
3690238 1
< 0.1%
3643952 1
< 0.1%
3581317 1
< 0.1%
3556730 1
< 0.1%
3544822 1
< 0.1%
3487265 1
< 0.1%
3425218 1
< 0.1%
3402343 1
< 0.1%
3372266 1
< 0.1%

Interactions

2023-12-20T12:52:21.239077image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-20T12:52:20.746036image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-20T12:52:21.487094image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-20T12:52:20.982969image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-20T12:52:26.677849image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
REF_DATEVALUEDGUIDGEOSectorCharacteristicsIndicatorsUOMSCALAR_FACTOR
REF_DATE1.0000.0270.0000.0000.0000.0000.0000.0000.000
VALUE0.0271.0000.0900.0900.0510.0440.0750.0810.105
DGUID0.0000.0901.0001.0000.0000.0000.0000.0000.000
GEO0.0000.0901.0001.0000.0000.0000.0000.0000.000
Sector0.0000.0510.0000.0001.0000.0000.0000.0000.000
Characteristics0.0000.0440.0000.0000.0001.0000.0000.0000.000
Indicators0.0000.0750.0000.0000.0000.0001.0001.0001.000
UOM0.0000.0810.0000.0000.0000.0001.0001.0000.447
SCALAR_FACTOR0.0000.1050.0000.0000.0000.0001.0000.4471.000

Missing values

2023-12-20T12:52:21.842797image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-20T12:52:22.269263image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

REF_DATEDGUIDGEOSectorCharacteristicsIndicatorsUOMSCALAR_FACTORVALUE
020102016A000011124CanadaTotal non-profit institutionsMale employeesNumber of jobsJobsunits642584.00
120102016A000011124CanadaTotal non-profit institutionsMale employeesHours workedHoursthousands1048516.00
220102016A000011124CanadaTotal non-profit institutionsMale employeesWages and salariesDollarsmillions30805.00
320102016A000011124CanadaTotal non-profit institutionsMale employeesAverage annual hours workedHoursunits1632.00
420102016A000011124CanadaTotal non-profit institutionsMale employeesAverage weekly hours workedHoursunits31.00
520102016A000011124CanadaTotal non-profit institutionsMale employeesAverage annual wages and salariesDollarsunits47940.00
620102016A000011124CanadaTotal non-profit institutionsMale employeesAverage hourly wageDollarsunits29.38
720102016A000011124CanadaTotal non-profit institutionsFemale employeesNumber of jobsJobsunits1500394.00
820102016A000011124CanadaTotal non-profit institutionsFemale employeesHours workedHoursthousands2331018.00
920102016A000011124CanadaTotal non-profit institutionsFemale employeesWages and salariesDollarsmillions60943.00
REF_DATEDGUIDGEOSectorCharacteristicsIndicatorsUOMSCALAR_FACTORVALUE
10583020212016A000262NunavutGovernment non-profit institutions55 to 64 yearsAverage weekly hours workedHoursunits33.00
10583120212016A000262NunavutGovernment non-profit institutions55 to 64 yearsAverage annual wages and salariesDollarsunits101380.00
10583220212016A000262NunavutGovernment non-profit institutions55 to 64 yearsAverage hourly wageDollarsunits59.98
10583320212016A000262NunavutGovernment non-profit institutions65 years old and overNumber of jobsJobsunits27.00
10583420212016A000262NunavutGovernment non-profit institutions65 years old and overHours workedHoursthousands30.00
10583520212016A000262NunavutGovernment non-profit institutions65 years old and overWages and salariesDollarsmillions2.00
10583620212016A000262NunavutGovernment non-profit institutions65 years old and overAverage annual hours workedHoursunits1111.00
10583720212016A000262NunavutGovernment non-profit institutions65 years old and overAverage weekly hours workedHoursunits21.00
10583820212016A000262NunavutGovernment non-profit institutions65 years old and overAverage annual wages and salariesDollarsunits74037.00
10583920212016A000262NunavutGovernment non-profit institutions65 years old and overAverage hourly wageDollarsunits66.63